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NUCLEIC ACID MOLECULE^ AND OTHER MOLECULES 
ASSOCIATED WITH SOYBEAN CYST NEMATODE RESISTANCE 

CROSS-REFERENCE TO RELATED APPLICATION 
This application claims priority under 35 U.S.C. §119(e) of U.S. Application No. 
60/174,880, filed January 7, 2000, the disclosure of which is herein incorporated by reference in 
its entirety. 

FIELD OF THE INVENTION 
The present invention is in the field of soybean genetics. More specifically, the invention 
relates to nucleic acid molecules from regions of the soybean genome, which are associated with 
soybean cyst nematode (SCN) resistance. The invention also relates to proteins encoded by such 
nucleic acid molecules as well as antibodies capable of recognizing these proteins. The 
invention also relates to nucleic acid markers from regions of the soybean genome, which are 
associated with SCN resistance. Moreover, the invention relates to uses of such molecules, 
including, transforming SCN sensitive soybean with constructs containing nucleic acid 
molecules from regions in the soybean genome, which are associated with SCN resistance. 
Furthermore, the invention relates to the use of such molecules in a plant breeding program. 

BACKGROUND OF THE INVENTION 
The soybean. Glycine max (L.) Merril (Glycine max or soybean), is one of the major 
economic crops grown worldwide as a primary source of vegetable oil and protein (Sinclair and 
Backman, Compendium of Soybean Diseases, 3'^ Ed. APS Press, St. Paul, MN, p. 106. (1989)). 
The growing demand for low cholesterol and high fiber diets has also increased soybean's 
importance as a health food. 

Prior to 1940, soybean cultivars were either direct releases of introductions brought from 
Asia or pure line selections from genetically diverse plant introductions. The soybean plant was 
primarily used as a hay crop in the early part of the 19th century. Only a few introductions were 
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large-seeded types useful for feed grain and oil production. From the mid 1930's through the 
1960's, gains in soybean seed yields were achieved by changing the breeding method from 
evaluation and selection of introduced germplasm to crossing ehte by elite lines. The continuous 
cycle of cross hybridizing the ehte strains selected from the progenies of previous crosses 
5 resulted in the modem day cultivars. 

Over 10,000 soybean strains have now been introduced into the United States since the 
early 1900's (Bernard et al. United States National Germplasm Collections, In: L.D. Hil (ed.), 
World Soybean Research, pp. 286-289. Interstate Printers and Publ., Danville, II. (1976)). A 
limited number of those introductions form the genetic base of cultivars developed from the 

10 hybridization and selection programs (Johnson and Bernard, The Soybean^ Norman Ed., 

^1 Academic Press, N. Y., pp. 1-73 (1963)). For example, in a survey conducted by Specht and 
3 J: Williams, Genetic Contributions, Fehr eds. American Soil Association, Wisconsin, pp, 49-73 
J:; (1984), for the 136 cultivars released from 1939 to 1989, only 16 different introductions were the 

J source of cytoplasm for 121 of that 136. Certain soybean strains are sensitive to one or more 
45 pathogens. One economically important pathogen is SCN. 

SCN accounts for roughly 40% of the total disease in soybean and can result in 

in significant yield losses (up to 90%). SCN is the most destructive pest of soybean to date and 

P 

11 accounts for an estimated yield loss of up to $809 miUion dollars annually. Currently, the most 
cost effective control measures are crop rotation and the use of host plant resistance. While 

20 breeders have successfully developed SCN resistant soybean lines, breeding is both difficult and 
time consuming due to the complex and polygenic nature of resistance. The resistance is often 
race specific and does not provide stability over time due to changing SCN populations in the 
field. In addition, many of the resistant soybean varieties carry a significant yield penalty when 
grown in the absence of SCN. 

25 SCN, Heterodera glycines Ichinohe, was identified on soybeans in the United States in 

1954 at Castle Hayne, N.C, Winstead, et al. Plant Dis. Rep, 39:9-1 1 (1955). Since its discovery 
the SCN has been recognized as one of the most destructive pests in soybean. It has been 
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reported in nearly all states in which soybeans are grown, and it causes major production 
problems in several states, being particularly destructive in the Midwestern states. See generally: 
Caldwell, et al.Agrom /. 52:635-636 (1960); Rao-Arelli and Anand, Crop. ScL 25:650-652, 
(1988); Baltazar and Mansur, Soybean Genet NewsL 79:120-122 (1992); Concibido, et al, 
Crop: Set, (1993). For example, sensitive soybean cultivars had 5.7-35.8% lower seed yields 
than did resistant cultivars on SCN race-3 infested sites in Iowa. (Niblack and Norton, Plant Dis. 
75:943-948 (1992)). 

Shortly after the discovery of SCN in the United States, sources of SCN resistance were 
identified (Ross and Brim, Plant Dis, Rep, 41:923-924 (1957)). Some lines such as Peking and 
Plant Introduction (PI) PI88788, were quickly incorporated into breeding programs. Peking 
became widely used as a source of resistance due to its lack of agronomically undesirable traits, 
with Pickett as the first SCN resistant cultivar released (Brim and Ross, Crop Sci. 6:305 (1966)). 
The recognition that certain SCN resistant populations could overcome resistant cultivars lead to 
an extensive screen for additional sources of SCN resistance. PI88788 emerged as a popular 
source of race 3 and 4 resistance even though it had a cyst index greater than 10% (but less than 
20%) against race 4, and Peking and its derivatives emerged as a popular source for races 1 and 
3. PI437654 was subsequently identified as having resistance to all known races and its SCN 
resistance was backcrossed into Forrest. Currently there are more than 130 Pis known to have 
SCN resistance. 

SCN race 3 is considered to be the prominent race in the Midwestern soybean producing 
states. Considerable effort has been devoted to the genetics and breeding for resistance to race 3. 
While both Peking and PI88788 are resistant to SCN race 3, classical genetics studies suggest 
that they harbor different genes for race 3 resistance (Rao-Arelli and Anand, Crop Sci 28:650- 
652 (1988)). Crosses between PI88788(R) and Essex(S) segregate 9(R): 55(S) in the F2 
population and 1(R): 26(Seg): 37(S) families in the F3 generation, suggesting that resistance to 
race 3 in PI88788 is conditioned by one recessive and two dominant genes, where as Peking and 
PI90763 resistance is conditioned by one dominant and two recessive genes. Based on 
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reciprocal crosses, Peking, Forrest, and PI90763 have genes in common for resistance to SCN 
race 3 (Rao-Arelli and Anand, Crop Sci, 2S:650-652 (1988)). A cross between Peking and 
PI88788 segregates 13(R):3(S) in the F2 generation, indicating a major difference between the 
parents for race 3 resistance. Generation mean analysis based on four crosses between resistant 
5 and sensitive genotypes; A20 (R), Jack (R), Cordell (R) and A2234 (S), suggests that an additive 
genetic model is sufficient to explain most of the genetic variation of race 3 SCN resistance in 
each cross, while the analysis of the pooled data indicates the presence of dominant effects as 
well (Mansur, Carriquiry and Roa-Arelli, Crop ScL 53:1249-1253 (1993)). This analysis further 
indicates that race 3 resistance is probably under the genetic control of three, but not more than 
10 four genes. 

% RFLP analysis of segregating populations between resistant and sensitive lines; PI209332 

(R), PI90763 (R), PI88788 (R), Peking (R) and Evan (S), identified a major SCN resistance QTL 

% ( rhgl) which maps to linkage group G (Concibido et al, TheorAppL Genet 95:234-241 

CD 

W (1996)). In this study, rhgl explains 51.4% of the phenotypic variation in PI209322, 52.7% of 
45 the variation in PI90763, 40.0% of the variation in PI88788 and 28. 1% of the variation in Peking. 

This major resistance QTL was assumed be one and the same in all of the mapping populations 
Iff employed. However, as pointed out by the authors, it is possible that the genomic interval 
contains distinct but tightly linked QTLs. In a related study using PI209332 as the source of 
resistance, Concibido et aL, Crop Sci 5(5:1643-1650 (1996), show that a QTL on Hnkage group 
20 G (rhgl) is effective against the three SCN races tested, explaining 35% of the phenotypic 

variation to race 1, 50% of the variation to race 3, and 54% of the variation to race 6. In addition 
to the major QTL on linkage group G, 4 other QTLs mapping to linkage groups D, J, L and K 
were identified, with some of the resistance loci behaving in a race specific manner. 

Concibido et aL (Crop ScL 57:258-264 (1997)) found significant association of marker 
25 C006V to a major QTL on linkage group G ( rhgl) and resistance to race 1, race 3 and race 6, in 
Peking and PI90763 (Evan X Peking, Evan X PI90763) and races 3 and 6 in PI88788 (Evan X 
PI88788), in agreement with the previous study based on the P209332 source of resistance 
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(Concibido et al. Crop ScL 5(5:1643-1650 (1996)). The resistance locus near C006V was 
effective against all races tested in all of the resistance sources. While statistically significant 
against all races, this locus accounts for different proportions of the total phenotypic variation 
with the races tested. For example, in PI90763 the resistance locus near C006V explains more 
5 than three times the phenotypic variation against race 1 than against race 3. The variability can 
be attributed to differences in the genetic backgrounds, variability among the SCN populations or 
may be a reflection of the limited size of the plant populations which were employed. This study 
further identified three additional independent SCN resistance QTLs; one near the RFLP marker 
A378H mapping to the opposite end of linkage group G from C006V {rhgl), one near the marker 
10 B032V-1 on linkage group J and a third linked to A280Hae-l on linkage group N. Comparisons 
^ between the different SCN races indicated that some of the putative SCN QTLs behave in a race 
J;; specific manner. 

J PI437654 was identified as having resistance to all known races. Based on analysis of 

328 recombinant inbreed lines (RIL) derived from a cross between PI437654 and BSRIOI, 

W 

45 Webb reported six QTLs associated with SCN resistance on linkage groups A2, CI, G, M, L25 
and L26 (U.S. Patent 5,491,081). An allele on linkage group G, presumed to be r%l, is 

yi involved with certain SCN races tested (races 1, 2, 3, 5 and 14), and has the largest reported 
phenotypic effect on resistance to every race. In contrast, the QTLs on linkage groups A2, CI, 
M, L25 and L26 act in a race specific manner. The QTL on Unkage group L25 was reportedly 

20 involved with four of the five races, while the QTLs on linkage groups, A2, CI and L26 were 
each involved in resistance to two of the five races (U.S. Patent 5,491,081). Webb further 
reports data that the resistance to any of the five races is likely to result from the combined 
effects of the QTL involved in each race (U.S. Patent 5,491,081). 

Qui et aL (TheorAppl Genet 95:356-364 (1999)) screened 200 ¥2-3 famihes derived from 

25 a cross between Peking and Essex and identified RFLP markers which are associated with SCN 
resistance QTLs on linkage groups B, E, I and H. The three QTLs on linkage groups B, E and H 
jointly account for 57.7% of the phenotypic variation to race 1, the QTLs on linkage groups H 
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and B account for 21.4% of the variation to race 3, while the QTLs on linkage groups I and E are 
associated with resistance to race 5 accounting for 14.0% of the phenotypic variation. In contrast 
to previous mapping studies which use Peking as the source of resistance, no significant 
association was detected to the rhgl locus on linkage group G. The authors point out that the 
5 marker Bngl22, which has been shown to have significant linkage to rhgl, is not polymorphic in 
the population employed (Concibido et al, Crop ScL 55:1643-1650 (1996)). 

It has been reported that the rhgl locus on linkage group G is necessary for the 
development of resistance to any of the SCN races. There have been efforts to develop 
molecular markers to identify breeding lines harboring the rhgl SCN resistant allele. One of the 
10 most commonly used markers for marker assisted selection (MAS) of rhgl is an SSR locus that 
% co-segregates and maps roughly 0.4 cM from rhgl. This SRR marker, BARC-Satt_309 is able to 
?2 distinguish most, if not all, of the SCN sensitive genotypes from those harboring rhgl from 
+: important sources of resistance such as Peking and PI437654. Two simple sequence repeat 
in markers have been reported that can be used to select for SCN resistance at the rhgl locus 
is (Concibido et aL, TheorAppl Genet 99: 811-818 (1999)). Satt_309 was also effective in 
hi distinguishing SCN resistant sources PI88788 and PI209332 in many, but not all, sensitive 
m genotypes. In particular, Satt_309 can not be used for MAS in populations developed from 
lI "typical" southern US cultivars (e,g., Lee, Bragg and Essex) crossed with resistance sources 
PI88788 orPI209332. 

20 Matson and Williams have reported a dominant SCN resistance locus, Rhg4, which is 

tightly linked to the T locus on linkage group A2 (Matson and Williams, Crop Sci, 5:447 
(1965)). The QTL reported by Webb on linkage group A2 maps near the 'z' locus and is 
considered to be Rhg4 (U.S. Patent 5,491,081). Webb concludes that only two loci on linkage 
groups A2 (Rhg4) and G (rhgl) explain the genetic variation to race 3. 
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SUMMARY OF THE INVENTION 
The present invention includes and provides a method for the production of a soybean plant 
having an rhgl SCN resistant allele comprising: (A) crossing a first soybean plant having an 
rhgl SCN resistant allele with a second soybean plant having an rhgl SCN sensitive allele to 
5 produce a segregating population; (B) screening the segregating population for a member having 
an rhgl SCN resistant allele with a first nucleic acid molecule capable of specifically hybridizing 
to linkage group G, wherein the first nucleic acid molecule specifically hybridizes to a second 
nucleic acid molecule that is linked to the rhgl SCN resistant allele; and, (C) selecting the 
member for further crossing and selection. 
10 The present invention includes and provides a method of investigating an rhgl haplotype 

3 of a soybean plant comprising: (A) isolating nucleic acid molecules from the soybean plant; (B) 
m. determining the nucleic acid sequence of an rhgl allele or part thereof; and, (C) comparing the 
5; nucleic acid sequence of the rhgl allele or part thereof to a reference nucleic acid sequence. 
J: 5 The present invention includes and provides a method of introgressing SCN resistance or partial 
?15 SCN resistance into a soybean plant comprising: performing marker assisted selection of the 
^ soybean plant with a nucleic acid marker, wherein the nucleic acid marker specifically hybridizes 
W with a nucleic acid molecule having a first nucleic acid sequence that is physically linked to a 
M= second nucleic acid sequence that is located on Unkage group G of soybean A3244, wherein the 
second nucleic acid sequence is within 500 kb of a third nucleic acid sequence which is capable 
20 of specifically hybridizing with the nucleic acid sequence of SEQ ID NO: 5, 6, complements 
thereof, or fragments thereof having at least 15 nucleotides; and, selecting the soybean plant 
based on the marker assisted selection. 

The present invention includes and provides a method for the production of a soybean plant 
having an Rhg4 SCN resistant allele comprising: (A) crossing a first soybean plant having an 
25 Rhg4 SCN resistant allele with a second soybean plant having an Rhg4 SCN sensitive allele to 
produce a segregating population; (B) screening the segregating population for a member having 
an Rhg4 SCN resistant allele with a first nucleic acid molecule capable of specifically 
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hybridizing to linkage group A2, wherein the first nucleic acid molecule specifically hybridizes 

to a second nucleic acid molecule Hnked to the Rhg4 SCN resistant allele; and, (C) selecting the 

member for further crossing and selection. 

The present invention includes and provides a method of investigating an Rhg4 haplotype 
5 of a soybean plant comprising: (A) isolating nucleic acid molecules from the soybean plant; (B) 

determining the nucleic acid sequence of an Rhg4 allele or part thereof; and (C) comparing the 

nucleic acid sequence of the Rhg4 allele or part thereof to a reference nucleic acid sequence. 

The present invention includes and provides a method of introgressing SCN resistance or 

partial SCN resistance into a soybean plant comprising: performing marker assisted selection of 
10 the soybean plant with a nucleic acid marker, wherein the nucleic acid marker specifically 
2 hybridizes with a nucleic acid molecule having a first nucleic acid sequence that is physically 

linked to a second nucleic acid sequence that is located on Unkage group A2 of soybean A3244, 
5 wherein the second nucleic acid sequence is within 500 kb of a third nucleic acid sequence which 
H1 specifically hybridizes with the nucleic acid sequence of SEQ ID NO: 7, complements thereof, or 
%5 fragments thereof having at least 15 nucleotides; and, selecting the soybean plant based on the 

marker assisted selection. 
m The present invention includes and provides a substantially purified nucleic acid 

2 molecule comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOs: 

5, 6, 8-23, 28-43, complements thereof, and fragments of either. 
20 The present invention includes and provides a substantially purified first nucleic acid 

molecule with nucleic acid sequence which specifically hybridizes to a second nucleic acid 

molecule having a nucleic acid sequence selected from the group consisting of a complement of 

SEQ ID NOs: 5, 6, 8-23, 28-43. 

The present invention includes and provides a substantially purified nucleic acid 
25 molecule comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOs: 

7^ 44-47, and 50-53, complements thereof, and fragments of either. 

The present invention includes and provides a substantially purified first nucleic acid 

molecule with nucleic acid sequence which specifically hybridizes to a second nucleic acid 
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molecule having a nucleic acid sequence selected from the group consisting of a complement of 
SEQIDNOs: 50-53. 

The present invention includes and provides a substantially purified protein or fragment 
thereof comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 
5 1097, 1098, and 1 100-1 1 15 and fragments thereof. 

The present invention includes and provides a substantially purified protein or fragment 
thereof comprising an amino acid sequence selected from the group consisting of SEQ ID NOs 
1099, and 1116-1119 and fragments thereof. 

The present invention includes and provides a transformed plant having a nucleic acid 
10 molecule which comprises: (A) an exogenous promoter region which functions in a plant cell to 
cause the production of a mRNA molecule; (B) a structural nucleic acid molecule encoding a 
protein or fragment thereof comprising an amino acid sequence selected from the group 
H consisting of SEQ ID NOs: 1097, 1100, 1098, 1101, 1102-1115; and (C) a 3' non-translated 
jf sequence that functions in the plant cell to cause termination of transcription and addition of 
J2E5 polyadenylated ribonucleotides to a 3' end of the mRNA molecule. 

The present invention includes and provides a transformed plant having a nucleic acid 
J-f molecule which comprises: (A) an exogenous promoter region which functions in a plant cell to 
O cause the production of a mRNA molecule; (B) a structural nucleic acid molecule encoding a 
O protein or fragment thereof comprising an amino acid sequence selected from the group 
20 consisting of SEQ ED NOs: 1099, 1116-1119; and (C) a 3' non-translated sequence that functions 
in the plant cell to cause termination of transcription and addition of polyadenylated 
ribonucleotides to a 3' end of the mRNA molecule. 

The present invention includes and provides a transgenic seed having a nucleic acid 
molecule which comprises: (A) an exogenous promoter region which functions to cause the 
25 production of a mRNA molecule; (B) a structural nucleic acid molecule encoding a protein or 
fragment thereof comprising an amino acid sequence selected from the group consisting of SEQ 
ID NOs: 1097, 1 100, 1098, 1101,1 102-1 1 15; and (C) a 3' non-translated sequence that functions 
to cause termination of transcription and addition of polyadenylated ribonucleotides to a 3' end 
of the noRNA molecule. 
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The present invention includes and provides a transgenic seed having a nucleic acid 
molecule which comprises: (A) an exogenous promoter region which functions to cause the 
production of a mRNA molecule; (B) a structural nucleic acid molecule encoding a protein or 
fragment thereof comprising an amino acid sequence selected from the group consisting of SEQ 
ID NOs: 1099, 1116-1119; and (C) a 3' non-translated sequence that functions to cause 
termination of transcription and addition of polyadenylated ribonucleotides to a 3' end of the 
mRNA molecule. 

DESCRIPTION OF THE FIGURES 
Figure 1 is an amino acid sequence ahgnment of the leucine rich repeat domain of rhgl. 
Figure 2 is an amino acid sequence ahgnment of the leucine rich repeat domain of Rhg4, 

DESCRIPTION OF THE SEQUENCE LISTINGS 

The following sequence listings form part of the present specification and are included to 
further demonstrate certain aspects of the present invention. The invention may be better 
understood by reference to one or more of these sequences in combination with the detailed 
description presented herein. 

SEQ ID NOs: 1-7 and 1097-1099 all refer to sequences from the line A3244. 

SEQ ID NO: 1 is sequence ID 515O02__region_G2 from Une A3244, and is adjacent to 
the contig containing rhgl, 

SEQ ID NO: 2 is sequence ID 240O17_region_G3 from Une A3244, and contains the 
rhgl, vJ four exon gene at coding coordinates 45163-45314, 45450-45509, 46941-48763, 
48975-49573. The amino acid translation for SEQ ID NO: 2 is SEQ ID NO: 1097. 

SEQ ED NO: 3 is sequence ID 240O17_region_G3 from line A3244, and contains the 
rhgl, V.2 two exon gene at coding coordinates 46798-48763 and 48975-49573. The amino acid 
translation for SEQ ID NO: 3 is SEQ ID NO: 1098. 



10 



04983.0216.NPUS01/38-10(15810)B 



SEQ ID NO: 4 is sequence ID 318013_region_A3 from line A3244, contains the Rhg4 
gene at coding coordinates 111805-113968 and 114684-115204, and has an amino acid 
translation of SEQ ID NO: 1099. 

SEQ ID NO: 5 is sequence ID 240O17_region_G3_8_mRNA, and comprises the two 
rhgl, V.2 exons from the coding sequence portion of SEQ ID NO: 3. 

SEQ ID NO: 6 is sequence ID 240O17_region_G3_8_cds, and comprises the four rhgl, 
v.l exons from the coding sequence portion of SEQ ID NO: 2. 

SEQ ID NO: 7 is sequence ID 3 1 80 13_region_A3_17_cds, and comprises the Rhg4 
coding sequence portion from SEQ ID NO: 4. 

SEQ ID NOs: 8-43 and 1100-1115 all refer to rhgl sequences. 

SEQ ID NO: 8 is sequence ID rhgl_A3244_amphcon from line A3244, contains four 
rhgl, v.l exons at coding coordinates 113-264, 400-459, 1891-3713, and 3925-4523, and has an 
amino acid translation of SEQ ID NO: 1100 and 1097. 

SEQ ID NO: 9 is sequence ID rhgl_A3244_amphcon, contains two rhgl, v.2 exons at 
coding coordinates 1748-3713 and 3925-4523 and has an amino acid translation of SEQ ID NO: 
1101 and 1098. 

SEQ ID NO: 10 is sequence ID rhgl_peking_amplicon from the line peking, contains 
four rhgl, v.l exons at coding coordinates 113-264, 400-459, 1888-3710, and 3903-4501, and 
has an amino acid translation of SEQ ID NO: 1 102. 

SEQ ID NO: 11 is sequence ID rhgl_peking_amplicon, contains two rhgl, v.2 exons at 
coding coordinates 1745-3710 and 3903-4501, and has an amino acid translation of SEQ ID NO: 
1103. 

SEQ ID NO: 12 is sequence ID rhgl_toyosuzu_amplicon from the line toyosuzu, 
contains four rhgl, v.l exons at coding coordinates 113-264, 400-459, 1890-3712, and 3924- 
4522, and has an amino acid translation of SEQ ID NO: 1104. 
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SEQ ID NO: 13 is sequence ID rhgl_toyosuzu_amplicon, contains two rhgl, v.2 exons at 
coding coordinates 1747-3712 and 3924-4522, and has an amino acid translation of SEQ ID NO: 
1105. 

SEQ ID NO: 14 is sequence ID rhgl_wiIl_amplicon from the line will, contains four 
5 rhgl, v.l exons at coding coordinates 1 13-264, 400-459, 1891-3713, and 3925-4523, and has an 
amino acid translation of SEQ ID NO: 1106. 

SEQ ID NO: 15 is sequence ID rhgl_will_amplicon, contains two rhgl, v.2 exons at 
coding coordinates 1748-3713 and 3925-4523, and has an amino acid translation of SEQ ID NO: 
1107. 

10 SEQ ID NO: 16 is sequence ID rhgl_a2704_amplicon from the line A2704, contains four 

J rhgl, v.l exons at coding coordinates 1 13-264, 400-459, 1891-3713, and 3925-4523, and has an 
y{ amino acid translation of SEQ ED NO: 1 108. 

5 SEQ ID NO: 17 is sequence ID rhgl_a2704_amplicon, contains two rhgl, v.2 exons at 

fl coding coordinates 1748-3713 and 3925-4523, and has an amino acid translation of SEQ ID NO: 
15 1109. 

1=^ SEQ ID NO: 18 is sequence ID rhgl_noir_amplicon from the line noir, contains four 

in rhgl, v.l exons at coding coordinates 1 13-264, 400-459, 1876-3698, and 3910-4508, and has an 
C amino acid translation of SEQ ID NO: 1 1 10. 

SEQ ID NO: 19 is sequence ID rhgl_noir_amplicon, contains two rhgl, v.2 exons at 
20 coding coordinates 1733-3698 and 3910-4508, and has an amino acid translation of SEQ ID NO: 

nil. 

SEQ ID NO: 20 is sequence ID rhgl_lee_amplicon from the line lee, contains four rhgl, 
v.l exons at coding coordinates 113-264, 400-459, 1876-3698, and 3910-4508, and has an amino 
acid translation of SEQ ID NO: 1 1 12. 
25 SEQ ID NO: 21 is sequence ID rhgl_lee_amplicon, contains two rhgl, v.2 exons at 

coding coordinates 1733-3698 and 3910-4508, and has an amino acid translation of SEQ ID NO: 
1113. 
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SEQ ID NO: 22 is sequence ID rhgl_pi200499_amplicon from the line PI200499, 
contains four rhgl, v.l exons at coding coordinates 113-264, 400-459, 1876-3698, and 3910- 
4508, and has an amino acid translation of SEQ ID NO: 1 1 14. 

SEQ ID NO: 23 is sequence ID rhgl_ pi200499_amplicon, contains two rhgl, v.l exons 
5 at coding coordinates 1733-3698 and 3910-4508, and has an amino acid translation of SEQ ID 
NO: 1115. 

SEQ ID NO: 24 is sequence ID 240O17_region_G3_forward_l, is a primer that 
hybridizes to coordinates 45051-45077 on contig 240017_region_G3 before the start codon, and 
can be used with SEQ ID NO: 25. 
10 SEQ ID NO: 25 is sequence ID 240O17_region_G3_reverse_l , is a primer that 

% hybridizes to coordinates 47942-47918 on contig 240017_region_G3, and can be used with SEQ 
J ID NO: 24. 

%_ SEQ ID NO: 26 is sequence ID 240O17_region_G3_forward_2, is a primer that 

f\ hybridizes to coordinates 47808-4783 1 on contig 240017_region_G3, and can be used with SEQ 

15 ID NO: 27. 

SEQ ID NO: 27 is sequence ID 240O17_region_G3_reverse_2, is a primer that 
In hybridizes to coordinates 49553-4953 1 of contig 240017_region_G3 prior to the stop codon, and 
C can be used with SEQ ID NO: 26. 

Primers given by SEQ ID NOs: 24-27 are used to create the amplicons of SEQ ID NOs: 
20 8-23. The final 22 bases are added to the actual amphcons in order to simulate the rest of the 
gene to the stop codon, in order to allow complete translation. 

SEQ ID NO: 28 is sequence ID rhgl_A3244_amplicon_cds, which is the coding 
sequence portion of SEQ ED NO: 8. 

SEQ ID NO: 29 is sequence ID rhgl_peking_amplicon_cds, which is the coding 
25 sequence portion of SEQ ID NO: 10. 

SEQ ID NO: 30 is sequence ID rhgl_toyosuzu_ampUcon_cds, which is the coding 
sequence portion of SEQ ID NO: 12. 
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SEQ ID NO: 31 is sequence ID rhgl_will_amplicon_cds, which is the coding sequence 
portion of SEQ ID NO: 14. 

SEQ ID NO: 32 is sequence ID rhgl_a2704_ampUcon_cds, which is the coding sequence 
portion of SEQ ID NO: 16. 

SEQ ID NO: 33 is sequence ID rhgl_noir_ampUcon_cds, which is the coding sequence 
portion of SEQ ID NO: 18. 

SEQ ID NO: 34 is sequence ID rhgl Jee_amplicon_cds, which is the coding sequence 
portion of SEQ ID NO: 20. 

SEQ ID NO: 35 is sequence ID rhgl_pi200499_amplicon_cds, which is the coding 
sequence portion of SEQ ID NO: 22. 

SEQ ID NO: 36 is sequence ID rhgl_A3244_amphcon_cds_2, which is the coding 
sequence portion of SEQ ID NO: 9. 

SEQ ID NO: 37 is sequence ID rhgl_peking_amplicon_cds_2, which is the coding 
sequence portion of SEQ ID NO: 1 1 . 

SEQ ID NO: 38 is sequence ID rhgl_toyosuzu_ampUcon_cds_2, which is the coding 
sequence portion of SEQ ID NO: 13. 

SEQ ID NO: 39 is sequence ID rhgl_will_amphcon_cds_2, which is the coding sequence 
portion of SEQ ID NO: 15. 

SEQ ID NO: 40 is sequence ID rhgl_a2704_amphcon_cds__2, which is the coding 
sequence portion of SEQ ID NO: 17. 

SEQ ID NO: 41 is sequence ID rhgl__noir__amphcon_cds„2, which is the coding 
sequence portion of SEQ ID NO: 19. 

SEQ ID NO: 42 is sequence ID rhgl_lee_amplicon_cds_2, which is the coding sequence 
portion of SEQ ID NO: 21. 

SEQ ID NO: 43 is sequence ID rhgl_pi200499_ampUcon_cds_2, which is the coding 
sequence portion of SEQ ID NO: 23. 

SEQ ID NOs: 44-53 and 1116-1119 all refer to Rhg4 sequences 
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SEQ ID NO: 44 is sequence ID rhg4_a3244_amplicon from the line A3244, contains 
Rhg4 at coding coordinates 79-2242 and 2958-3478, is made using SEQ ID NOs: 48 and 49, and 
has an amino acid translation of SEQ ID NO: 1116 and 1099, 

SEQ ID NO: 45 is sequence ID rhg4_Minsoy_amphcon from the line Minsoy, contains 
5 Rhg4 at coding coordinates 79-2242 and 2958-3478, is made using SEQ ID NOs: 48 and 49, and 
has an amino acid translation of SEQ ID NO: 1117. 

SEQ ED NO: 46 is sequence ID rhg4_Jack__amplicon from the line Jack, contains Rhg4 at 
coding coordinates 79-2242 and 2958-3478, is made using SEQ ID NO: 48 and 49, and has an 
amino acid translation of SEQ ID NO: 1118. 
10 SEQ ID NO: 47 is sequence ID rhg4_peking„amplicon from the line Peking, contains 

J Rhg4 at coding coordinates 79-2242 and 2958-3478, is made using SEQ ID NOs: 48 and 49, and 

has an amino acid translation of SEQ ID NO: 1 119. 
J SEQ ID NO: 48 is sequence ID 318013_region_A3_forward, hybridizes to coordinates 

fl 111727-111756 of contig 318013 jegion_A3, and is a primer used with SEQ ID NO: 49 to 
5L5 create Rhg4 amplicons. 

H SEQ ID NO: 49 is sequence ID 318013j:egion_A3_reverse, hybridizes to coordinates 

111 115206-115177 of contig 318013_region_A3, and is a primer used with SEQ ID NO: 48 to 
H create Rhg4 amplicons. 

SEQ ID NO: 50 is sequence ID rhg4_A3244_amplicon_cds, which is the coding 
20 sequence portion of SEQ ID NO: 44. 

SEQ ID NO: 51 is sequence ID rhg4_Minsoy_amplicon_cds, which is the coding 
sequence portion of SEQ ID NO: 45. 

SEQ ID NO: 52 is sequence ID rhg4_Jack_amplicon_cds, which is the coding sequence 
portion of SEQ ID NO: 46. 
25 SEQ ID NO: 53 is sequence ID rhg4__peking_amplicon_cds, which is the coding 

sequence portion of SEQ ID NO: 47. 
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SEQ ID NO: 1 120 is sequence ID consensusLRR, which is a consensus sequence for the 
LRR repeats shown in Figures 1 and 2. 

SEQ ID NO: 1 121 is sequence ID rhglLRR, which is the amino acid sequence of the 
LRR domain shown in Figure L 
5 SEQ ID NO: 1 122 is sequence ID Rhg4LRR, which is the amino acid sequence of the 

LRR domain shown in Figure 2. 

SEQ ID NO: 1123 is sequence ID 240O17_region_G3„forward„l_b, which is an 
alternate primer that hybridizes to coordinates 45046-45072 on contig 240017_region_G3 before 
the start codon, and which can be used with SEQ ID NO: 25. 
10 Table 1 below provides further information on the sequences described herein. 

JS In table 1, for all rows, "Seq Num" refers to the corresponding SEQ ID NO in the 

s j sequence listing. 

f For rows with SEQ ID NOs: 1-53 and 1 120-1 123 "Seq ID" refers to the name of the SEQ 

W ID NO given in the "Seq Num" column. 

For rows with SEQ ID NOs: 2-4, 8-23, and 44-47 "Coding Sequence" refers to the 
H coordinates of the coding portion of the SEQ ID NO given in the "Seq Num" column, and "AA" 
m refers to the SEQ ID NO that is the amino acid translation of the SEQ ID NO given in the "Seq 
1=^ Num" column. 

For rows with SEQ ID NOs: 24-27 and 1123, "Primer location on 240017_region_G3" 
20 refers to the coordinates of the 240017_region_G3 contig to which the SEQ ID NO given in the 
"Seq Num" column hybridizes. 

For rows with SEQ ID NOs: 48 and 49, "Primer location on 318013_region_A3" refers 
to the coordinates of the 318013_region_A3 contig to which the SEQ ID NO given in the "Seq 
Num" column hybridizes. 
25 For rows with SEQ ID NOs: 54-400, "Seq ID" refers to the names of amplicon 

sequences. Within the Seq ID is the " " (double length underscore) symbol The name before 
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this symbol refers to the name of the contig in which the ampHcon is found, and the numbers 
after this symbol refer to the nucleotide location of the SSR on the contig. 

For rows with SEQ ID NOs: 401-1096, "Seq ID" refers to the names of primer sequences 
used in PGR to generate the amphcon sequences in table 1. For these rows, the "Seq ID" name 
5 contains the same name as the ampUcon that is generated by the pair of primers of which the 
SEQ ID NO referred to in the first column is a member. The "Seq ID" name also contains either 
"Forward" or "Reverse," which indicates the orientation of the primer. For these sequences, 
"location of primer on contig start" and "location of primer on contig end" refer, respectively, to 
the first and last base number of the contig on which the primer aligns. 



TABLE 1 



Seq Num 


Seq ID 






1 


515O02_region__G2 








Seq Num 


Seq ID 


Coding Sequence 


AA No. 


2 


240O17_region„G3 


45163-45314,45450-45509,46941-48763,48975-49573 


1097 


3 


240O17_region_G3 


46798-48763,48975-49573 


1098 


4 


318013 region_A3 


1 1 1805-1 13968,1 14684-1 15204 


1099 




Seq Num 


Seq ID 






5 


2400 17_region_G3_8_mRN A 






6 


240O17_region_G3„8_cds 






7 


318013 region„A3_17„cds 








Seq Num 


Seq ID 


Coding Sequence 


AA No. 


8 


rhgl A3244 amplicon 


113-264,400-459,1891-3713,3925-4523 


ilOO 


9 


rhgl A3244 amplicon 


1748-3713,3925-4523 


1101 


10 


rhg l_peking_amplicon 


113-264,400-459,1888-3710,3903-4501 


1102 


11 


rhgl peking„amplicon 


1745-3710,3903-4501 


1103 


12 


rhg l_toyosuzu_amplicon 


1 1 3-264,400-459, 1 890-37 12,3924-4522 


1104 


13 


rhgl toyosuzu amplicon 


1747-3712,3924-4522 


1105 


14 


rhgl will amplicon 


1 1 3-264,400-459, 1 89 1 -37 13 ,3925-4523 


1106 


15 


rhgl will amplicon 


1748-3713,3925-4523 


1107 


16 


rhgl_a2704_amphcon 


113-264,400-459,1891-3713,3925-4523 


1108 


17 


rhgl a2704 amplicon 


1748-3713,3925-4523 


1109 


18 


rhgl noir amplicon 


113-264,400-459,1876-3698,3910-4508 


1110 


19 


rhgl noir amplicon 


1733-3698,3910-4508 


nil 


20 


rhgl lee amplicon 


113-264,400-459,1876-3698,3910-4508 


1112 


21 


rhgl lee amplicon 


1733-3698,3910-4508 


1113 


22 


rhg l_pi200499_amphcon 


113-264,400-459,1876-3698,3910-4508 


1114 


23 


rhgl pi200499 amplicon 


1733-3698,3910-4508 


1115 
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Seq ID 


Primer location on 240017 region_G3 




OA 


240O17„region_G3_forward_l 


45051-45077 






240O17_region_G3_reverse_l 


47942-47918 






2400 1 7_region_G3_forward_2 


47808-47831 




97 


240017 region G3_reverse_2 


49553-49531 










Seq Num 


C—— TT\ 

Seq ID 






28 


rhgl A3244 amplicon cds 






29 


rhg 1 _peking_amplicon_cds 






30 


rhgl toyosuzu amplicon cds 






31 


rhgl will amplicon cds 






32 


rhgl a2704 amplicon cds 






33 


rhgl noir amplicon cds 






34 


rhgl lee amplicon cds 






35 


rhgl pi200499 amplicon cds 






36 


rhgl A3244_amplicon_cds_2 






37 


rhgl peking amplicon cds 2 






JO 


rhgl toyosuzu amplicon cds 2 








rhgl will amplicon cds 2 






/I A 


rhgl a2704 amplicon cds 2 






A 1 


rhgl noir amplicon cds 2 






A9 


rhgl lee amplicon cds 2 








rhgl pi200499 amplicon cds 2 










aCQ iMUlll 


Seq ID 


Coding Sequence 


AA No, 


AA 


rhg4_a3244_amplicon 


79-2242,2958-3478 


1116 


A< 
4j 


rhg4 Minsoy amplicon 


79-2242,2958-3478 


1117 


46 


rhg4 Jack amplicon 


79-2242,2958-3478 


1118 


47 


rhg4 peking„amplicon 


79-2242,2958-3478 


1119 








SeQ Num 


Seq ID 


Primer location on 318013 region A3 




AQ 


31801 3_region„A3_forward 


111727-111756 




AO 


318013 region A3_reverse 


115206-115177 










ScQ Num 


Seq ID 








rhg4 A3244 amplicon cds 






Ji 


rhg4 Minsoy amplicon cds 








rhg4 Jack amplicon cds 






DO 


rhg4 peking_,amplicon cds 










Seq Num 


Seq ID 






54 


240017 region G3 289711 11 








240017 region G3 236585 14 






56 


240017 region G3 168772 13 






57 


240017 region G3 332420„21 






58 


240017 region G3 228126 18 






59 


240017 region G3 139723 11 






60 


240017 region G3 280585 14 






61 


240017 region G3 70509 14 
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Seq Num 


Seq ID 






62 


240O17 region G3 50537 17 






63 


240017 region G3 231556 17 






64 


240017 region G3 117057 11 






65 


240017 region G3 23092 13 






66 


240O17_region„G3_29774L14 






67 


240017 region G3 206502 14 






68 


240017 region G3 221223 13 






69 


240017 region G3 169084_14 






70 


240017 region G3 94891 14 






71 


240017 region G3 281852_61 






72 


240017 region G3 46583 12 






73 


240017 region G3 306835 13 






74 


240017 region G3 85471 12 






75 


240017 region G3 257208 12 






76 


240017 region G3 150390 17 






77 


240017 region G3 34697 75 






78 


240017 region G3 150374_13 






79 


240017 region G3 40513 22 






80 


240017 region G3 268602 14 






81 


240017 region G3 25357 13 






82 


240017 region G3 137548 13 






83 


240017 region G3 139131 13 






84 


240017 region G3 203855^12 






85 


240O17_region_G3_199049_15 






86 


240O17_region_G3_320907_12 






87 


2400 1 7_region_G3_16407_ 17 






88 


240017 region G3 206516 17 






89 


240O17_region_G3_264495_13 






90 


240017 region G3 156785 13 






91 


240017 region G3 187129 12 






92 


240017 region G3 214106 13 






93 


240017 region G3 149013 12 






94 


240017 region G3 326352 16 






95 


240017 region G3 278962„12 






96 


240017 region G3 256930 13 






97 


240017 region G3 29646 14 






98 


240O17„region_G3_29618„13 






99 


240017 region G3 108561 14 






100 


240017 region G3 143975_14 






101 


240017 region G3 108431 20 






102 


240017 region G3 281764 11 






103 


240017 region G3 130058 15 






104 


2400 17_region_G3_3 10590„52 






105 


240017 region G3 313405„14 






106 


240017 region G3 302190 13 






107 


240O i7_region_G3_225343_17 






108 


240017 region G3 208823^14 






109 


240017 region G3 74285 11 






110 


240017 region G3 109052 16 






111 


240O17_region_G3_6395_12 







19 



04983.0216.NPUS01/38-10(15810)B 



Seq Num 


Seq ID 






112 


240017 region G3 244905 16 






113 


240017 region G3 244956 13 






114 


240017 region G3 117220 13 






115 


240017 region G3 134707„14 






116 


240017 region G3 35078 13 






117 


240017 region G3 210506 16 






118 


240017 region G3 116961 26 






119 


240017 region G3 51073 13 






120 


240017 region G3 55291 15 






121 


240017 region G3 22965 1_1 8 






122 


240017 region G3 303308 19 






123 


240017 region G3 168373 20 






124 


240017 region G3 253333„17 






125 


240017 region G3 5791 13 






126 


240017 region G3 206841 19 






127 


240017 region G3 202827 12 






128 


240017 region G3 322656 13 






129 


240017 region G3 111841 14 






130 


240017 region G3 192719 13 






131 


240017 region G3 195630_17 






132 


240017 region G3 69999 13 






133 


240017 region G3 11176 13 






134 


240017 region G3 228643 13 






135 


240017 region G3 88478 19 






136 


240017 region G3 108950 13 






137 


240017 region G3 121054 14 






138 


240017 region G3 188337 14 






139 


240017 region G3 255944 21 






140 


240017 region G3 219518_i4 






141 


240017 region G3 235601 15 






142 


240017 region G3 301529_13 






143 


240017 region G3 94795 14 






144 


240O17_region_G3_46703_23 






145 


240017 region G3 59616 14 






146 


240017 region G3 296933_15 






147 


240017 region G3 192428 17 






148 


240017 region G3 191490 14 






149 


240017 region G3 201115 11 






150 


240017 region G3 72882 15 






151 


240017 region G3 69514 13 






152 


240017 region G3 37699 47 






153 


240017 region G3 11301 29 






154 


240017 region G3 141875_12 






155 


240017 region G3 98090 18 






156 


240017 region G3 43298 35 






157 


240017 region G3 262094 11 






158 


240017 region G3 262079 15 






159 


240017 region G3 59090 12 






160 


240017 region G3 245723 13 






161 


240017 region G3 194628 54 
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Seq Num 


Seq ID 






162 


240017 region G3 4566 16 






163 


240017 region G3 96209 14 






164 


240017 region G3 248715_17 






165 


240017 region G3 71410 40 






166 


240017 region G3 226519 13 






167 


240017 region G3 11282 19 






168 


240O17_region_G3_170504_12 






169 


240O17_region_G3_40864_14 






170 


240017 region G3 13529„14 






171 


240017 region G3 22858_14 






172 


240017 region G3 309211 13 






173 


240017 region G3 55568_26 






174 


240O17_region_G3_73238_16 






175 


240017 region G3 52488 19 






176 


318013 region A3 471518_14 






177 


31801 3_region_A3_23 1599_23 






178 


318013 region A3 375912„13 






179 


318013 region A3 180013 12 






180 


318013 region A3 171606 14 






181 


318013 region A3 416256 13 






182 


318013 region A3 231395 15 






183 


3 18O13_region_A3_5502_47 






184 


318013 region A3 93061^14 






185 


318013 region A3 111684 19 






186 


318013 region A3 69328 14 






187 


318013 region A3 36529 17 






188 


318013 region A3 139128 12 






189 


318013_region_A3_495674_13 






190 


318013 region A3 187577 13 






191 


318013 region A3 453036 14 






192 


318013 region A3 374041 13 






193 


318013 region A3 3412 11 






194 


318013 region A3 276495„28 






195 


318013 region A3 151839 17 






196 


318013 region A3 292912_12 






197 


318013 region A3 104560 12 






198 


318013 region A3 65193 11 






199 


318013 region A3 110573 70 






200 


318013 region A3 65117 12 






201 


318O13_region_A3_490837„16 






202 


318013 region A3 107448 11 






203 


318013_region_A3_331_23 






204 


318013 region A3 193470 13 






205 


318013 region A3 183305 14 






206 


318013 region A3 55050 14 






207 


318013 region A3 224693 21 






208 


318013 region A3 207216 12 






209 


318013 region A3 4654 22 






210 


31801 3_region_A3_408959_l 3 






211 


318013 region A3 132288 22 
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Seq Num 


Seq ID 






212 


318013 region A3 292822 20 






213 


318013 region A3 311076„12 






214 


318O13_region_A3_509623_13 






215 * 


318013 region A3 190404_14 






216 


318013 region A3 164916 15 






217 


318013 region A3 21028_13 






218 


318013 region A3 208012_17 






219 


318013 region A3 484089_14 






220 


318013 region A3 332780_17 






221 


318013 region A3 480137 37 






222 


3 1 80 13_region_A3_44 1 056_14 






223 


318013 region A3 77486 11 






224 


318013 region A3 272468 11 






225 


318013 region A3 425319 17 






226 


318013 region A3 413879 31 






227 


318O13_region_A3_80477„64 






228 


318013 region A3 277272_50 






229 


318013 region A3 509642_13 






230 


318013 region A3 321771_14 






231 


318013 region A3 26788„12 






232 


318013 region A3 262706 16 






233 


318013 region A3 243928_16 






234 


318013 region A3 23246 14 






235 


318013 region A3 165406 12 






236 


3 18013_region_A3_486294„14 






237 


318013 region A3 46754„12 






238 


318013 region A3 381116 15 






239 


318013 region A3 350369 11 






240 


318013 region A3 138841 13 






241 


318013 region A3 12158 14 






242 


318013 region A3 315368 13 






243 


3 18O13„region_A3_307549_13 






244 


318013 region A3 159857 14 






245 


318013 region A3 140551 15 






246 


318013 region A3 279869 11 






247 


318013 region A3 78292 35 






248 


318013 region A3 185019 12 






249 


318013 region A3 409164 13 






250 


318013 region A3 75392 14 






251 


318O13_region_A3_231320_12 






252 


3 18013_region_A3_38 1 102_14 






253 


318013 region A3 491826 15 






254 


3 1 80 1 3_region_A3_56365_21 






255 


318013 region A3 372628 15 






256 


318013 region A3 302609 11 






257 


318O13_region_A3_341804_l 1 






258 


318O13_region_A3_217037_l 1 






259 


318013 region A3 264929 68 






260 


318013 region A3 55499 12 






261 


318013 region A3 295634 14 
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Seq Num 


Seq ID 






262 


318013 region A3 269358„15 






263 


318013 region A3 457009_24 






264 


318013_region„A3_176598_14 






265 


318013 region A3 278266„12 






266 


318013 region A3 391810 12 






267 


318013 region A3 269485„15 






268 


318013 region A3 359247„17 






269 


318013 region A3 315094„13 






270 


318013 region A3 307823 13 






271 


318013 region A3 248588_15 






272 


318013 region A3 252426_85 






273 


318013 region A3 513314 16 






274 


318013 region A3 68183„14 






275 


318013 region A3 471191_13 






276 


318013 region A3 163547 18 






277 


318013_region_A3_417867_15 






278 


318013 region A3 332465_14 






279 


318013 region A3 207697_14 






280 


318013 region A3 277229_43 






281 


31801 3_region_A3_36366„l 1 






282 


318013 region A3 91970 12 






283 


318013 region A3 211533„11 






284 


318013 region A3 336301_11 






285 


318013 region A3 441603_14 






286 


318013 region A3 468354 15 






287 


318013 region A3 188983_18 






288 


318013 region A3 115502_17 






289 


318013 region A3 163006 13 






290 


318013 region A3 119283 14 






291 


318013 region A3 491126_11 






292 


318013 region A3 99512 21 






293 


318013 region A3 280291_17 






294 


318013 region A3 138443 19 






295 


318013 region A3 115973 14 






296 


318013 region A3 329977 14 






297 


318O13_region_A3_205203„14 






298 


318013 region A3 153114 12 






299 


318013 region A3 34581 13 






300 


318013 region A3 292577 19 






301 


3 18013_region_A3_44539 1_20 






302 


318013 region A3 350540_17 






303 


318013 region A3 453879 15 






304 


318013 region A3 201246 13 






305 


318013 region A3 326020 13 






306 


318O13_region_A3_503801_14 






307 


318013 region A3 302400 52 






308 


318013 region A3 448857^15 






309 


318013 region A3 48364 14 






310 


31801 3„region_A3_25 1 804_48 






311 


318013 region A3 382583 13 
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Seq Num 


Seq ID 






n — I 

312 


318013 region A3 124737_14 






313 


318013 region A3 124766_13 






314 


318013 region A3 461351 16 






315 


318013 region A3 64953_19 






316 


318013 region A3 366586_13 






317 


318013 region A3 46190 15 






318 


318013 region A3 81016„il 






319 


318013 region A3 134426 14 






320 


318013_region_A3_292724_14 






321 


31801 3_region_A3_l 87096_1 7 






322 


318013 region A3 381693 13 






323 


318013 region A3 361286_33 






324 


318013 region A3 482668_14 






325 


318013 region A3 128002„12 






326 


318013 region A3 499270 14 






327 


318013 region A3 231650_12 






328 


318013 region A3 199851_13 






329 


318013 region A3 324629_13 






330 


31801 3_region_A3_374 190_1 9 






331 


318013 region A3 460603 13 






332 


318013 region A3 108681_14 






333 


318013 region A3 459791 47 






334 


318013 region A3 4257 20 






335 


318013 region A3 238810_14 






336 


3 18013_region_A3_245817_14 






337 


318013 region A3 245956_14 






338 


318013 region A3 74148„14 






339 


318013 region A3 74089 15 






340 


318013 region A3 241686 12 






341 


318013 region A3 47476_12 






342 


318013 region A3 164550 12 






343 


318013 region A3 101255„15 






344 


5 15O02„.region_G2_16189_l 1 






345 


515002 region G2 71925 13 






346 


515002 region G2 4707^12 






347 


515002 region G2 118904„18 






348 


515002 region G2 13655 17 






349 


515002 region G2 53900 13 






350 


515002 region G2 8079 14 






351 


515002 region G2 9969 28 






352 


515002 region G2 72308 77 






353 


515002 region G2 99475 19 






354 


515002 region G2 118615„18 






355 


515002 region G2 119001 46 






356 


515002 region G2 118958 43 






357 


5 15O02_region_G2_17 197_13 






358 


515002 region G2 105163 29 






359 


515002 region G2 111335 13 






360 


515002 region G2 106396 13 






361 


515002 region G2 59229 17 
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Seq Num 


Seq ID 






362 


515002 region G2 73795 20 






363 


515002 region G2 85664 20 






364 


515002 region G2 36921 17 






365 


515002 region G2 124150 19 






366 


515002 region G2 5089 14 






367 


515002 region G2 58221 15 






368 


515002 region G2 96139 14 






369 


515002 region G2 70595^13 






370 


515002 region G2 4340 15 






371 


515O02_region_G2_90417_l 1 






372 


515002 region G2 49711 17 






373 


515002 region G2 63053„13 






374 


515002 region G2 63076 14 






375 


515002 region G2 44442 12 






376 


515002 region G2 44422 19 






377 


5 15O02_region_G2_44158_19 






378 


515002 region G2 44141 17 






379 


5 15O02_region_G2_90762_17 






380 


515002 region G2 106241 14 






381 


515002 region G2 109676_12 






382 


515002 region G2 86242 14 






383 


515002 region G2 83109 12 






384 


515002 region G2 10461 15 






385 


515002 region G2 67608 15 






386 


5 15O02_region_G2_63275_46 






387 


515002 region G2 62405 14 






388 


5 15O02_region_G2_33563_12 






389 


515002 region G2 33146 14 






390 


515002 region G2 102179„29 






391 


515002 region G2 2646 15 






392 


515002 region G2 76652 24 






393 


515002 region G2 66280 14 






394 


515002 region G2 54768 13 






395 


515002 region G2 62580„14 






396 


515002 region G2 34598 55 






397 


515002 region G2 77680 13 






398 


515002 region G2 77693 12 






399 


5 15O02_region_G2_97392_14 






400 


5 15O02_region_G2_97359_15 







Seq Num 


Seq ID 


location of primer 
on contig start 


location of primer 
on contig end 


401 


240017 region G3 289711 11 Forward_Primer 


289637 


289661 


402 


240017 region G3 289711 1 l_Reverse_Primer 


289756 


289732 


403 


240017 region G3 236585 14_Forward_Primer 


236511 


236535 


404 


240017 region G3 236585 14_Reverse_Primer 


236638 


236614 


405 


240017 region G3 168772 13_Forward_Primer 


168683 


168707 


406 


240017 region G3 1 68772^1 3_Reverse„Primer 


168811 


168786 


407 


240017 region G3 332420 21_Forward_Primer 


332375 


332399 


408 


240017 region G3 332420 21 Reverse_Primer 


332505 


332481 
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Seq Num 


Seq ID 


location of primer 
on contig start 


location of primer 
on contig end 


409 


240017 region G3 228126 18 Forward Primer 


228048 


228072 


410 


240017 region G3 228126 18 Reverse_Primer 


228182 


228158 


411 


240017 region G3_139723_ll__Forward_Primer 


139666 


139690 


412 


240017 region G3 139723 11 Reverse_Primer 


139802 


139778 


413 


240017 region G3 280585 14_Forward_Primer 


280524 


280550 


414 


240017 region G3 280585 14 Reverse_Primer 


280661 


280637 


415 


240017 region G3 70509_14_Forward_Primer 


70478 


70502 


416 


240017 region G3 70509_14_Reverse_Primer 


70616 


70592 


All 


240017 region G3 50537 17 Forward„Primer 


50455 


50479 


418 


240017 region G3 50537 17„Reverse_Primer 


50593 


50569 


419 


240017 region G3 231556„17_Forward_Primer 


231468 


231492 


420 


240017 region G3 231556 17„Reverse_Primer 


231606 


231582 


421 


240017 region G3 117057 11 Forward„Primer 


117029 


117053 


422 


240017 region G3 117057 ll_Reverse_Primer 


117169 


117145 


423 


240017 region G3 23092_13_Forward_Primer 


23010 


23034 


424 


240017 region G3 23092 13 Reverse_Primer 


23151 


23127 


425 


240017 region G3 297741 14 Forward_Primer 


297680 


297704 


426 


240017 region G3 29774 l_14_Reverse_Priiner 


297823 


297799 


427 


240017 region G3 206502 14„Forward_Primer 


206456 


206480 


428 


240017 region G3 206502 14 Reverse_Primer 


206600 


206581 


429 


240017 region G3 221223 13 Forward_Primer 


221134 


221158 


430 


240017 region G3 221223 13_Reverse_Primer 


221278 


221254 


431 


240017 region G3 169084 14 Forward_Primer 


169051 


169075 


432 


240017 region G3 169084 14_Reverse_Primer 


169196 


169173 


433 


240017 region G3 94891 14_Forward_Primer 


94784 


94808 


434 


240017 region G3 94891 14_Reverse_Primer 


94929 


94905 


435 


240017 region_G3_7439_12_Forward_Primer 


7397 


7421 


436 


240017 region G3 7439 12 Reverse_Primer 


7542 


7518 


437 


240017 region G3 281852_61_Forward_Primer 


281797 


281821 


438 


240017 region G3 281852_61_Reverse_Primer 


281943 


281919 


439 


240017 region G3 46583 12 Forward_„Primer 


46554 


46578 


440 


240017 region G3 46583 12_Reverse_Primer 


46700 


46676 


441 


240017 region G3 306835 13_Forward_Primer 


306727 


306751 


442 


240017 region G3 306835 13_Reverse_Primer 


306874 


306849 


443 


240017 region G3 85471 12„Forward_Primer 


85359 


85383 


444 


240017 region_G3_85471_12_Reverse_Primer 


85507 


85483 


445 


240017 region G3 257208_12_Forward_Primer 


257129 


257153 


446 


240017 region__G3_257208_i2_Reverse_Primer 


257278 


257254 


447 


240017 region G3 150390 17 Forward_Primer 


150327 


150351 


448 


240017 region G3 150390 17 Reverse_Primer 


150476 


150452 


449 


240017 region G3 34697 75 Forward_Primer 


34662 


34685 


450 


240017 region G3 34697 75 Reverse Primer 


34811 


34787 


451 


240017 region G3 150374_13_Forward_Primer 


150327 


150351 


452 


240017 region G3 150374 13_Reverse_Primer 


150476 


150452 


453 


240017 region G3 40513 22 Forward„Primer 


40422 


40446 


454 


240017 region G3 40513 22 Reverse_Primer 


40572 


40548 


455 


240017 region G3 268602„14_Forward_Primer 


268555 


268579 


456 


240017 region G3_268602„14_Reverse_Primer 


268705 


268681 


457 


240017 region G3 25357 13 Forward_Primer 


25271 


25295 
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Seq Num 


SeqID 


location of primer 
on contig start 


location of primer 
on contig end 


458 


240017 region G3 25357 13 Reverse_Primer 


25422 


25402 


459 


240017 region G3 137548 13 Forward_Primer 


139088 


139111 


459 


240017 region G3 137548 13_Forward_Primer 


137505 


137528 


460 


240017 region G3 137548 13 Reverse_Primer 


139239 


139215 


460 


240017 region G3 137548 13 Reverse_Primer 


137656 


137632 


461 


240017 region G3 139131 13 Forward_Primer 


139088 


139111 


462 


240017 region G3 139131 13„Reverse_Primer 


139239 


139215 


463 


240017 region G3 203855 12 Forward_Primer 


203749 


203773 


464 


240017 region G3 203855 12 Reverse„Primer 


203901 


203877 


465 


240017 region G3 199049 15 Forward_Primer 


199008 


199033 


466 


240017 region G3 199049 15 Reverse_Primer 


199160 


199136 


467 


240017 region G3 320907 12 Forward_Primer 


320885 


320906 


468 


240017 region G3 320907 12 Reverse_Primer 


321038 


321015 


469 


240017 region G3 16407 17 Forward_Primer 


16330 


16354 


470 


240017 region G3 16407 17 Reverse_Prinier 


16483 


16459 


471 


240017 region G3 206516 17 Forward_Primer 


206482 


206506 


472 


240017 region G3 206516 17 Reverse_Primer 


206635 


206616 


473 


240017 region G3 264495 13_Forward_Primer 


264423 


264447 


474 


240017 region G3 264495 13 Reverse_Primer 


264577 


264553 


475 


240017 region G3 156785 13 Forward_Primer 


156713 


156737 


476 


240017 region G3 156785 13_Reverse„Primer 


156868 


156844 


477 


240017 region G3 187129 12 Forward„Primer 


187068 


187092 


478 


240017 region G3 187129 12_Reverse__Primer 


187223 


187199 


479 


240017 region G3 214106 13 Forward_Primer 


214042 


214067 


480 


240017 region G3 214106 13_Reverse_Primer 


214197 


214173 


481 


240017 region G3 149013 12 Forward_Primer 


148898 


148922 


482 


240O17_region_.G3_149013_12_Reverse_Primer 


149053 


149027 


483 


240017 region G3 326352 16„Forward_Primer 


326311 


326335 


484 


240017 region G3 326352 16 Reverse_Primer 


326467 


326443 


485 


240017 region G3 278962 12 Forward_Primer 


278933 


278957 


486 


240017 region G3 278962 12„Reverse_Primer 


279089 


279065 


487 


240017 region G3 256930 13 Forward_Primer 


256850 


256874 


488 


240017 region G3 256930 13 Reverse_Primer 


257006 


256982 


489 


240017 region G3 29646 14 Forward Primer 


29589 


29613 


490 


240017 region G3 29646 14 Reverse_Primer 


29746 


29721 


491 


240017 region G3 29618 13 Forward„Primer 


29589 


29613 


492 


240017 region G3 29618 13 Reverse_Primer 


29746 


29721 


493 


240017 region G3 108561 14 Forward.Primer 


108518 


108542 


494 


240O17_region„G3_108561_14_Reverse_Prinier 


108675 


108651 


495 


240017 region G3 143975 14 Forward_Primer 


143939 


143964 


496 


240017 region G3 143975 14 Reverse_Primer 


144096 


144072 


497 


240017 region G3 108431 20 Forward_Primer 


108362 


108386 


498 


240017 region G3 108431 20 Reverse_Primer 


108520 


108497 


499 


240017 region G3 281764 11 Forward_Primer 


281645 


281669 


500 


240017 region G3 281764 11 Reverse_Primer 


281803 


281779 


501 


240017 region G3 130058 15 Forward_Primer 


129994 


130018 


502 


240017 region G3 130058 15 Reverse Primer 


130153 


130129 


503 


240017 region G3 310590 52 Forward_Primer 


310533 


310557 


504 


240017 region G3 310590 52 Reverse^Primer 


310692 


310668 
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Seq Nam 


Seq ID 


location of primer ] 
on contig start 


ocation of primer 
on contig end 


505 


240017 reeion G3 313405 14 Forward Primer 


313345 


313369 


506 


740017 region G3 313405 14 Reverse Primer 


313505 


313481 


507 


240017 region G3 302190 13 Forward„Primer 


302093 


302119 


508 


240017 region G3 302190 13 Reverse Primer 


302253 


302229 


509 


240017 region G3 225343 17 Forward Primer 


225315 


225338 


510 


240O17 region G3 225343 17 Reverse Primer 


225475 


225451 


511 


240017 region G3 208823.._14 Forward Primer 


208760 


208784 


512 


240017 region G3 208823 14 Reverse Primer 


208921 


208897 


513 


240017 region G3 74285 11 Forward Primer 


74220 


74244 


514 


240017 region G3 74285 11 Reverse Primer 


74382 


74358 


515 


240017 region G3 109052 16 Forward Primer 


108999 


109023 


516 


240017 region G3 109052 16 Reverse Primer 


109161 


109137 


517 


240017 region G3 6395 12 Forward Primer 


6285 


6309 


518 


240017 reeion G3 6395 12 Reverse Primer 


6447 


6423 


519 


240017 region G3 244905 16 Forward_Primer 


244865 


244890 


520 


240017 region G3 244905 16 Reverse Primer 


245028 


245004 


521 


240017 redon G3 244956 13 Forward Primer 


244865 


244890 


522 


240017 region G3 244956 13 Reverse Primer 


245028 


245004 


523 


240017 region G3 117220 13 Forward Primer 


117175 


117199 


524 


740017 reirion G3 117220 13 Reverse Primer 


117339 


117315 


525 


240017 resion G3 134707 14 Forward Primer 


134584 


134608 


526 


240017 region G3 134707 14 Reverse_Primer 


134749 


134725 


527 


240017 region G3 35078 13 Forward Primer 


34990 


35013 


528 


240017 region G3 35078 13 Reverse Primer 


35157 


35133 


529 


240017 region G3 210506 16 Forward Primer 


210477 


210501 


530 


240017 region G3 210506 16 Reverse_Primer 


210644 


210620 


531 


240017 region G3 116961 26 Forward Primer 


116885 


116909 


532 


240017 region G3 116961 26 Reverse Primer 


117053 


117029 


533 


240017 region G3 51073 13 Forward Primer 


50979 


51003 


534 


240017 region G3 51073 13 Reverse Primer 


51147 


51123 


535 


240017 region G3 55291 15 Forward Primer 


55164 


55188 


536 


240017 region G3 55291 15 Reverse Primer 


55333 


55309 


537 


240017 region G3 229651 18 Forward Primer 


229615 


229639 


538 


240017 region G3 229651 18 Reverse Primer 


229784 


229760 


539 


240017 region G3 303308 19 Forward Primer 


303284 


303307 


540 


240017 region G3 303308 19 Reverse Primer 


303454 


303429 


541 


240017 region G3 168373 20 Forward Primer 


168262 


168286 


542 


240017 region G3 168373 20 Reverse Primer 


168432 


168408 


543 


240017 region G3 253333 17 Forward Primer 


253257 


253281 


544 


240017 region G3 253333 17 Reverse Primer 


253428 


253404 


545 


240017 region G3 5791 13 Forward Primer 


5766 


5790 


546 


240017 region G3 5791 13 Reverse Primer 


5937 


5912 


547 


240017 region G3 206841 19 Forward Primer 


206821 


206840 


548 


240017 region G3 206841 19 Reverse Primer 


206993 


206969 


549 


240017 region G3 202827 12_Forward„Primer 


202782 


zUZoUd 


550 


240017 region G3 202827 12 Reverse_Primer 


202956 


202932 


551 


240017 region G3 322656 13 Forward_Primer 


322572 


322598 


552 


240017 region G3 322656 13 Reverse_Primer 


322748 


322724 


553 


240017 region G3 111841 14 Forward Primer 


111709 


111733 
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Seq Num 


Seq ID 


location of primer 
on contig start 


location of primer 
on contig end 


554 


240017 region G3 111841 14 Reverse_Primer 


111886 


111861 


555 


240017 region G3 192719 13 Forward_Primer 


192589 


192613 


556 


240017 region G3 192719 13 Reverse_Primer 


192767 


192743 


557 


240017 region G3 195630 17 Forward„Primer 


195490 


195514 


558 


240017 region G3 195630 17„Reverse„Primer 


195672 


195648 


559 


240017 region G3 69999 13„Forward_Primer 


69858 


69881 


560 


240017 region G3 69999„13_Reverse_Primer 


70040 


70016 


561 


240017 region G3 11176 13 Forward_Primer 


11060 


11084 


562 


240017 region G3 11176 13 Reverse_Primer 


11243 


11219 


563 


240017 region G3 228643 13 Forward„Primer 


228529 


228553 


564 


240017 region G3 228643 13 Reverse_Primer 


228713 


228689 


565 


240017 region G3 88478 19 Forward_Primer 


88378 


88402 


566 


240017 region G3 88478 19 Reverse_Primer 


88562 


88538 


567 


240017 region G3 108950 13_Forward_Primer 


108838 


108858 


568 


240017 region G3 108950 13„Reverse_Primer 


109023 


108998 


569 


240017 region G3 121054 14 Forward„Primer 


120911 


120935 


570 


240017 region G3 121054 14„Reverse„Primer 


121096 


121072 


571 


240017 region G3 188337 14_Forward_Primer 


188204 


188228 


572 


240017 region G3 188337 14_Reverse_Primer 


191544 


191520 


572 


240017 region G3 188337 14 Reverse_Primer 


188391 


188367 


573 


240017 region G3 255944 21 Forward_Primer 


255879 


255903 


574 


240017 region G3 255944 21 Reverse_Primer 


256068 


256044 


575 


240017 region G3 219518 14Jorward_Primer 


219420 


219444 


576 


240017 region G3 219518 14 Reverse_Primer 


219609 


219585 


577 


240017 region G3_235601_15„Forward_Primer 


235483 


235507 


578 


240017 region G3 235601 15__Reverse_Primer 


235673 


235649 


579 


240017 region G3 301529 13 Forward_Primer 


301498 


301522 


580 


240017 region G3 301529 13_Reverse„Primer 


301689 


301665 


581 


240017 region G3 94795 14„Forward__Primer 


94735 


94756 


582 


240017 region G3 94795 14 Reverse_Primer 


94929 


94905 


583 


240017 region G3_46703„23_Forward_Primer 


46676 


46700 


584 


240017 region G3 46703 23 Reverse_Primer 


46870 


46846 


585 


240017 region G3 59616 14 Forward_Primer 


59539 


59563 


586 


240017 region G3 59616 14 Reverse_Primer 


59738 


59714 


587 


240017 region G3 296933 15 Forward_Primer 


296908 


296932 


588 


240017 region G3 296933 15 Reverse__Primer 


297113 


297089 


589 


240017 region G3 192428 17_Forward_Primer 


192402 


192426 


590 


240017 region G3 192428 17 Reverse_Primer 


192613 


192589 


591 


240017 region G3 191490 14_Forward__Primer 


191332 


191356 


592 


240017 region G3 191490 14 Reverse Primer 


191544 


191520 


593 


240017 region G3 201115 11 Forward_Primer 


200994 


201018 


594 


240017 region G3 201115 11 Reverse Primer 


201214 


201189 


595 


240017 region G3 72882 15 Forward_Primer 


72848 


72874 


596 


240017 region G3 72882 15 Reverse_Primer 


73068 


73042 


597 


240O17„region_G3_69514_13_Forward_Primer 


69411 


69437 


598 


2400 1 7_region_G3_695 14_ 1 3_Reverse_Primer 


69632 


69608 


599 


240017 region G3 37699 47 Forward_Primer 


37601 


37625 


600 


240017 region G3 37699 47 Reverse_Primer 


37827 


37802 


601 


240O17_region_G3_11301_29_Forward_Prinier 


11274 


11300 
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Seq Num 


Seq ID 


location of primer 
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location of primer 
on contig end 


602 


240017 region G3 11301 29„Reverse_Primer 


11501 


11477 


603 


240017 region G3 141875 12_Forward_Primer 


141729 


141750 


604 


240017 region G3 141875„12_Reverse_Primer 


141964 


141939 


605 


240017 region G3 98090 18_Forward_Primer 


98037 


98062 


606 


240017 region G3 98090„18_Reverse_Priiner 


98274 


98250 


607 


240017 region G3 43298 35 Forward„Primer 


43144 


43168 


608 


240017 region G3 43298 35_Reverse_Primer 


43387 


43363 


609 


240017 region G3 262094 ll„Forward_Primer 


261989 


262014 


610 


240017 region G3 262094_i l_Reverse_Primer 


262236 


262211 


611 


240017 region G3 262079_15_Forward_Primer 


261989 


262014 


612 


240017 region G3 262079 15 Reverse_Primer 


262236 


262211 


613 


240017 region G3 59090 12_Forward_Primer 


58986 


59012 


614 


240017 region G3 59090 12 Reverse_Primer 


59248 


59224 


615 


240017 region G3 245723 13_Forward_Primer 


245502 


245526 


616 


240017 region G3 245723 13„Reverse_Primer 


245766 


245742 


617 


240017 region G3 194628_54_Forward_Primer 


194581 


194607 


618 


240017 region G3 194628 54 Reverse„Primer 


194846 


194822 


619 


240017 region G3 4566 16 Forward_Primer 


4455 


4479 


620 


240017 region G3 4566 16_Reverse_Primer 


4722 


4696 


621 


240017 region G3 96209 14 Forward_Primer 


96119 


96143 


622 


240017 region G3 96209 14„Reverse_Primer 


96392 


96368 


623 


240017 region G3 248715 17„Forward_Primer 


248633 


248657 


624 


240017 region G3 248715„17_Reverse_Primer 


248906 


248882 


625 


240017 region G3 71410 40„Forward_Primer 


71357 


71379 


626 


240017 region G3 71410 40„Reverse_Primer 


71636 


71611 


627 


240017 region G3 226519 13_Forward_Primer 


226315 


226339 


628 


240017 region G3 226519 13_Reverse_Primer 


226598 


226574 


629 


240017 region G3 11282 19„Forward_Primer 


11217 


11242 


630 


240017 region G3 11282 19_Reverse_Primer 


11501 


11477 


631 


240017 region G3 170504 12_Forward_Primer 


170409 


170433 


632 


240017 region G3 170504„12_Reverse_Primer 


170694 


170671 


633 


240017 region G3 40864 14 Forward_Primer 


40652 


40678 


634 


240017 region G3 40864 14 Reverse_Primer 


40938 


40912 


635 


240017 region G3 13529 14 Forward_Primer 


13332 


13356 


636 


240017 region G3 13529 14 Reverse_Primer 


13622 


13598 


637 


240017 region G3 22858 14 Forward_Primer 


22675 


22699 


638 


240017 region G3 22858 14_Reverse_Primer 


22966 


22942 


639 


240017 region G3 30921 l_13_Forward_Primer 


309092 


309118 


640 


240017 region G3 309211 13 Reverse_Primer 


309383 


309358 


641 


240017 region G3 55568_26_Forward_Primer 


55375 


55399 


642 


240017 region G3 55568 26 Reverse_Primer 


55667 


55642 


643 


240017 region G3 73238 16 Forward_Primer 


73043 


73069 


644 


240017 region G3 73238 16 Reverse_Primer 


73342 


73318 


645 


240O17__region_G3_52488„19_Forward_Primer 


52413 


52437 


646 


240017 region G3 52488 19 Reverse_Primer 


52712 


52688 


647 


318013 region A3 471518 14„Forward„Primer_Seq 


471464 


471488 


648 


318013 region A3 471518„14_Reverse_Primer_Seq 


471567 


471541 


649 


318013 region A3 231599 23 Forward_Primer_Seq 


231568 


231592 


650 


318013 region A3 231599 23 Reverse_Primer_Seq 


231672 


231651 
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651 


318013 region A3 375912 13_Forward_Primer_Seq 


375845 


375865 


652 


318013 region A3 375912 13 Reverse_Primer_Seq 


375954 


375932 


653 


318013 region A3 180013_12_Forward_Primer„Seq 


179951 


179974 


654 


318013 region A3 180013 12_Reverse_Primer_Seq 


180060 


180038 


655 


318013 region A3 171606 14_Forward_Primer_Seq 


171545 


171569 


656 


318013 region A3 171606 14„Reverse_Primer_Seq 


171657 


171633 


657 


318013 region A3 416256_13_Forward„Primer_Seq 


416180 


416203 


658 


318013 region A3 416256_13_Reverse_Primer__Seq 


416293 


416269 


659 


318013 region A3 231395_15_Forward_Primer_Seq 


231339 


231363 


660 


318013 region A3 231395_15_Reverse_Primer_„Seq 


231461 


231438 


661 


318013 region A3 5502„47Forward_Primer_Seq 


5461 


5485 


662 


318013 region A3 5502 47 Reverse„Primer_Seq 


5585 


5561 


663 


318013 region A3 93061 14„Forward_Primer_Seq 


92988 


93012 


664 


318013 region A3 93061 14_Reverse_Primer_Seq 


93112 


93090 


665 


318013 region A3 111684 19_Forward_Primer_Seq 


111646 


111670 


666 


318013 region A3 111684_19_Reverse_Primer„Seq 


111772 


111748 


667 


318013 region A3 69328 14_Forward_Primer_Seq 


69246 


69269 


668 


318013 region A3 69328 14 Reverse„Primer_Seq 


69373 


69349 


669 


318013 region A3 36529 17 Forward_Primer_Seq 


36488 


36512 


670 


318013 region A3 36529 17_Reverse_Primer_Seq 


36617 


36593 


671 


318013 region A3 139128_12_Forward_Primer_Seq 


139043 


139067 


672 


318013 region A3 139128 12„Reverse_Prinier_Seq 


139174 


139150 


673 


318013 region A3 495674 13 Forward__Primer_Seq 


495592 


495616 


674 


318013 region A3 495674„13_Reverse_Primer_Seq 


495723 


495699 


675 


318013 region A3 187577 13_Forward_Primer_Seq 


187482 


187506 


676 


318013 region A3 187577 13 Reverse„Primer„Seq 


187613 


187590 


677 


318013 region A3 453036 14_Forward_Primer_Seq 


452999 


453023 


678 


318013 region A3 453036 14 Reverse_Primer_Seq 


453132 


453108 


679 


318013 region A3 374041 13_Forward_Primer_Seq 


373964 


373988 


680 


318013 region A3 374041 13 Reverse_Primer_Seq 


374097 


374073 


681 


318013 region A3 3412 11 Forward Primer_Seq 


3319 


3341 


682 


318013 region A3 3412 11 Reverse_Primer„Seq 


3454 


3430 


683 


318013 region A3 276495 28 Forward„Primer_Seq 


276462 


276485 


684 


318013 region A3 276495„28_Reverse_Primer_Seq 


276598 


276574 


685 


318013 region A3 151839 17 Forward„Primer„Seq 


151744 


151768 


686 


318013 region A3 151839 17_Reverse_Primer_Seq 


151882 


151858 


687 


318013 region A3 292912 i2_Forward_Primer_Seq 


292875 


292899 


688 


318013 region A3 292912 12_Reverse_Primer_Seq 


293014 


292990 


689 


318013 region A3 104560 12_Forward_Primer„Seq 


104464 


104488 


690 


318013 region A3 104560 12 Reverse_Primer_Seq 


104604 


104580 


691 


318013 region A3 65193 11 Forward Primer_Seq 


65155 


65179 


692 


318013 region A3_65193_ll_Reverse_Primer_Seq 


65295 


65271 


693 


318013 region A3 110573 70_Forward_Primer_Seq 


110533 


110559 


694 


318013 region A3 110573 70_Reverse_Primer_Seq 


110674 


110648 


695 


318013 region A3 65117 12 Forward_Primer_Seq 


65034 


65058 


696 


318013 region A3 65117 12_Reverse_Primer_Seq 


65177 


65153 


697 


318013 region A3 490837_16_Forward_Primer_Seq 


490762 


490786 


698 


318013 region A3 490837 16_Reverse_Primer_Seq 


490905 


490881 


699 


318013 region A3 107448 1 l_Forward_Primer_Seq 


107385 


107411 
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700 


318013 region A3 107448 1 l_Reverse_Primer_Seq 


107529 


107505 


701 


318013 region A3 331 23_Forward_Primer_Seq 


276 


301 


702 


318013 region A3 331 23 Reverse„Primer„Seq 


421 


397 


703 


318013 region A3 193470 13_Forward_Primer__Seq 


193444 


193468 


704 


318013 region A3 193470 13_Reverse_Primer_Seq 


193589 


193565 


705 


318013 region A3 183305_14_Forward_Primer_Seq 


183239 


183263 


706 


318013 region A3 183305 14_Reverse_Primer_Seq 


183384 


183360 


707 


318013 region A3 55050„14_Forward__Primer_Seq 


54998 


55022 


708 


318013 region A3 55050_14_Reverse_Primer_Seq 


55144 


55120 


709 


318013 region A3 224693 21 Forward_Primer_Seq 


224656 


224682 


710 


318013 region A3 224693 21_Reverse_Primer__Seq 


224803 


224779 


711 


318013 region A3 207216_12_Forward_Primer_Seq 


207152 


207176 


712 


318013 region A3 207216„12_Reverse„Primer_Seq 


207299 


207276 


713 


318013 region A3 4654„22_Forward„Primer_Seq 


4612 


4636 


714 


318013 region A3 4654 22 Reverse Primer Seq 


4760 


4736 


715 


318013 region A3 408959 13 Forward_Primer_Seq 


408918 


408942 


716 


318013 region A3 408959„13_Reverse„Primer_Seq 


409066 


409042 


717 


318013 region A3 132288 22_Forward_Primer„Seq 


132192 


132216 


718 


318013 region A3 132288_22_Reverse_Primer_Seq 


132340 


132316 


719 


318013 region A3 292822_20_Forward_Primer_Seq 


292747 


292771 


720 


318013 region A3 292822_20_Reverse_Primer_Seq 


292895 


292871 


721 


318013 region A3 311076 12 Forward_Primer_Seq 


311027 


311051 


722 


318013 region A3 311076 12 Reverse__Primer_Seq 


311175 


311152 


723 


318013 region A3 509623„13_Forward_Primer_Seq 


509584 


509608 


724 


318013 region A3 509623_13_Reverse__Primer_Seq 


509732 


509708 


725 


318013 region A3 190404_14_Forward„Primer_Seq 


190358 


190382 


726 


318013 region A3 190404 14_Reverse_Primer_Seq 


190506 


190482 


727 


318013 region A3 164916_15„Forward_Primer_Seq 


164808 


164832 


728 


318013 region A3 164916 15_Reverse_Primer_Seq 


164957 


164933 


729 


318013 region A3 21028 13 Forward_Primer_Seq 


21001 


21026 


730 


318013 region A3 21028 13 Reverse_Primer_Seq 


21150 


21126 


731 


318013 region A3 208012 17_Forward_Primer_Seq 


207955 


207979 


732 


318013 region A3 208012_17_Reverse_Primer_Seq 


208104 


208085 


733 


318013 region A3 484089„14_Forward_Primer_Seq 


484036 


484060 


734 


318013 region A3 484089_14_Reverse_Primer_Seq 


484185 


484161 


735 


318013 region A3 332780_17_Forward„Primer_Seq 


332723 


332747 


736 


318013 region A3 332780 17_Reverse_Primer_Seq 


332872 


332853 


737 


318013 region A3 480137„37_Forward_Primer_Seq 


480059 


480084 


738 


318013 region A3 480137_37_Reverse_Primer__Seq 


480208 


480182 


739 


318013 region A3 441056 14 Forward_Primer_Seq 


441011 


441035 


740 


318013 region A3 441056 14_Reverse_Primer_Seq 


441161 


441138 


741 


318013 region A3 77486__ll_Forward_Primer_Seq 


77447 


77471 


742 


318013 region A3 77486 1 l_Reverse_Primer_Seq 


77597 


77573 


743 


318013 region A3 272468 11 Forward_Primer_Seq 


272423 


272447 


744 


318013 region A3 272468 1 l_Reverse_Primer_Seq 


272573 


272549 


745 


318013 region A3 425319_17_Forward_Primer_Seq 


425233 


425257 


746 


318013 region A3 425319 17_Reverse_Primer_Seq 


425383 


425359 


747 


318013 region A3 413879_31_Forward_Primer_Seq 


413835 


413859 


748 


318013 region A3 413879 31 Reverse_Primer_Seq 


413985 


413961 
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— T 

749 


318013 region A3 80477 64„Forward_Primer_Seq 


80440 


80464 


750 


31 8013 region A3 80477 64_Reverse_Primer_Seq 


80591 


80567 


751 


318013 region A3 277272 50_Forward_Primer_Seq 


277213 


277237 


752 


318013 region A3 277272 50 Reverse„Primer_Seq 


277364 


277340 


753 


318013 region A3 509642 13 Forward Primer Seq 


509604 


509628 


754 


318013 region A3 509642 13 Reverse Primer Seq 


509755 


509731 


755 


318013 region A3 321771 14 Forward_Primer_Seq 


321663 


321687 


756 


318013 region A3 321771 14_Reverse„Primer_Seq 


321815 


321791 


757 


318013 region A3 26788_12_Forward_Primer_Seq 


26734 


26758 


758 


318013 region A3 26788 12 Reverse Primer Seq 


26886 


26862 


759 


318013 region A3 262706 16_Forward_Primer_Seq 


262649 


262673 


760 


318013 redon A3 262706 16„Reverse_Primer_Seq 


262802 


262778 


761 


318013 region A3 243928 16 Forward Primer Seq 


243891 


243915 


762 


318013 region A3 243928 16 Reverse_Primer„Seq 


244044 


244020 


763 


318013 region A3 23246 148_Forward„Primer_Seq 


23215 


23239 


764 


318013 region A3 23246 148 Reverse_Primer_Seq 


23368 


23344 


765 


318013 region A3 165406 12 Forward Primer Seq 


165367 


165391 


766 


318013 region A3 165406 12 Reverse_Primer_Seq 


165521 


165497 


767 


318013 region A3 486294 14„Forward_Primer„Seq 


486208 


486232 


768 


318013 region A3 486294 14 Reverse Primer Seq 


486362 


486338 


769 


318013 region A3 46754 12 Forward_Primer_Seq 


46661 


46685 


770 


318013 region A3 46754 12 Reverse Primer Seq 


46816 


46792 


771 


318013 region A3 381116 15 Forward_Primer_Seq 


381080 


381104 


772 


318013 region A3 381116 15 Reverse__Primer_Seq 


381235 


381211 


773 


318013 region A3 350369 1 l„Forward_Primer_Seq 


350295 


350319 


774 


318013 region A3 350369_ll_Reverse_Primer_Seq 


350450 


350426 


775 


318013 region A3 138841 13_Forward_Primer_Seq 


138795 


138819 


776 


318013 region A3 138841 13 Reverse Primer Seq 


138950 


138926 


777 


318013 region A3 12158 142 Forward Primer Seq 


12117 


12141 


778 


318013 region A3 12158 142_Reverse_Primer_Seq 


12272 


12248 


779 


318013 region A3 315368 13_Forward_Primer„Seq 


315310 


315334 


780 


318013 region A3 315368 13 Reverse„Primer_Seq 


315465 


315441 


781 


318013 region A3 307549 13_Forward_Primer_Seq 


307464 


307488 


782 


318013 region A3 307549 13_Reverse„Primer_Seq 


307619 


307595 


783 


318013 region A3 159857 14_Forward_Primer„Seq 


159772 


159796 


784 


318013 region A3 159857_14_Reverse_Primer_Seq 


159928 


159904 


785 


318013 region A3 140551 15 Forward_Primer_Seq 


140454 


140478 


786 


318013 region A3 14055 l_15„Reverse_Primer_Seq 


140610 


140586 


787 


318013 region A3 279869 1 l_Forward_Primer_Seq 


279797 


279821 


788 


318013 region A3 279869 1 l_Reverse_Primer_Seq 


279953 


279929 


789 


318013 region A3 78292_35_Forward„Primer_Seq 


78265 


78291 


790 


318013 region A3 78292_35_Reverse„Primer„Seq 


78422 


78397 


791 


318013 region A3 185019 12 Forward Primer Seq 


184953 


184977 


792 


318013 region A3 185019 12 Reverse_.Primer_Seq 


185111 


185087 


793 


318013 region A3 409164 13_Forward_Primer_Seq 


409082 


409106 


794 


318013 region A3 409164_13_Reverse_Primer_Seq 


409240 


409219 


795 


318013 region A3 75392 14„Forward_Primer_Seq 


75287 


75311 


796 


318013 region A3 75392„14_Reverse„Primer„Seq 


75445 


75421 


797 


318013 region A3 231320_12„Forward_Primer_Seq 


231269 


231293 
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798 


318013 region A3 231320 12 Reverse Primer Seq 


231429 


231405 


799 


318013 region A3 381102 14_Forward„Primer_Seq 


381041 


381064 


800 


318013 region A3 381102 14 Reverse_Primer_Seq 


381201 


381176 


801 


318013 region A3 491826 15 Forward_Primer_Seq 


491753 


491777 


802 


318013 region A3 491826 15 Reverse Primer Seq 


491914 


491891 


803 


318013 region A3 56365 21 Forward Primer Seq 


56336 


56360 


804 


318013 region A3 56365 21 Reverse_Primer„Seq 


56497 


56473 


805 


318013 region A3 372628 15 Forward Primer Seq 


372554 


372578 


806 


318013 region A3 372628 15_Reverse_Primer_Seq 


372715 


372691 


807 


318013 region A3 217037 11 Forward Primer Seq 


216919 


216943 


808 


318013 resion A3 217037 11 Reverse Primer Seq 


217081 


217057 


809 


318013 region A3 302609 11 Forward Primer Seq 


302575 


302599 


810 


318013 region A3 302609 11 Reverse Primer Seq 


302737 


302713 


811 


318013 region A3 341804 11 Forward_Primer_Seq 


341686 


341710 


812 


318013 region A3 341804 11 Reverse_Primer_Seq 


341848 


341824 


807 


318013 resion A3 217037 11 Forward Primer Seq 


216919 


216943 


808 


318013 region A3 217037 11 Reverse Primer Seq 


217081 


217057 


813 


318013 region A3 264929 68_Forward_Primer_Seq 


264862 


264886 


814 


318013 region A3 264929 68 Reverse_Primer_Seq 


265024 


265000 


815 


318013 region A3 55499 12 Forward_Primer„Seq 


55400 


55424 


816 


318013 region A3 55499 12 Reverse_Primer_Seq 


55563 


55539 


817 


318013 region A3 295634 14 Forward_Primer_Seq 


295538 


295562 


818 


318013 region A3 295634 14 Reverse Primer Seq 


295702 


295677 


819 


318013 region A3 269358 15 Forward Primer Seq 


269242 


269266 


820 


318013 region A3 269358 15 Reverse Primer_Seq 


269406 


269382 


821 


318013 region A3 457009 24_Forward_Primer_Seq 


456924 


456948 


822 


318013 region A3 457009_24„Reverse_Primer_Seq 


457088 


457064 


823 


318013 region A3 176598 14 Forward Primer Seq 


176554 


176578 


824 


318013 region A3 176598 14 Reverse Primer Seq 


176718 


176694 


825 


318013 region A3 278266 12„Forward_Primer_Seq 


278210 


278234 


826 


318013 region A3 278266 12_Reverse_Primer_Seq 


278376 


278350 


827 


318013 region A3 391810„12_Forward_Primer_Seq 


391683 


391707 


828 


318013 region A3 391810_12_Reverse_Primer_Seq 


391851 


391826 


829 


318013 region A3 269485 15_Forward_Primer_Seq 


269383 


269407 


830 


318013 region A3 269485 15_Reverse„Primer_Seq 


269551 


269527 


831 


318013 region A3 359247^1 7_Forward__Primer_Seq 


359218 


359243 


832 


318013 region A3 359247 17_Reverse_Primer_Seq 


359386 


359362 


833 


318013 region A3 315094_13_Forward_Primer_Seq 


314976 


315002 


834 


318013 region A3 315094 13_Reverse_Primer„Seq 


315145 


315120 


835 


318013 region A3 307823 13 Forward_Primer_Seq 


307784 


307809 


836 


318013 region A3 307823 13 Reverse_Primer_Seq 


307953 


307927 


837 


318013 region A3 248588 15 Forward_Primer_Seq 


248540 


248564 


838 


318013 region A3 248588 15„Reverse_Primer_Seq 


248709 


248684 


839 


318013 region A3 252426 85_Forward_Primer„Seq 


252398 


252423 


840 


318013 region A3 252426 85_Reverse_Primer_Seq 


252568 


252543 


841 


318013 region A3 513314 16_Forward_Primer_Seq 


513209 


513233 


842 


318013 region A3 513314 16 Reverse_Primer_Seq 


513379 


513355 


843 


318013 region A3 68183 14_Forward„Primer_Seq 


68108 


68132 


844 


318013 region A3 68183 14_Reverse_Primer_Seq 


68279 


68255 
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845 


318013 region A3 471191 13 Forward Primer Seq 


471059 


471083 


846 


318013 region A3 471191 13 Reverse Primer Seq 


471231 


471206 


847 


318013 region A3 163547 18 Forward Primer Seq 


163459 


163483 


848 


^180n region A3 163547 18 Reverse Primer Seq 


163632 


163608 


849 


318013 region A3 417867 15 Forward Primer Seq 


417839 


417863 


850 


318013 region A3 417867 15 Reverse Primer Seq 


418014 


417990 


851 


318013 resion A3 332465 14 Forward Primer Seq 


332346 


332370 


852 


318013 region A3 332465 14 Reverse Primer Seq 


332523 


332499 


853 


318013 region A3 207697 14 Forward Primer Seq 


207578 


207602 


854 


318013 region A3 207697 14 Reverse Primer Seq 


207755 


207731 


855 


318013 region A3 277229 43 Forward Primer Seq 


277186 


277210 


856 


318013 region A3 277229 43 Reverse_Primer_Seq 


277364 


277340 


857 


318013 region A3 36366 11 Forward_Primer_Seq 


36323 


36345 


858 


318013 region A3 36366 11 Reverse Primer Seq 


36501 


36477 


859 


318013 region A3 91970 12 Forward_Primer_Seq 


91938 


91962 


860 


318013 region A3 91970 12 Reverse Primer Seq 


92116 


92091 


861 


318013 region A3 211533 11 Forward Primer Seq 


211406 


211430 


862 


318013 region A3 211533 11 Reverse„Primer_Seq 


211585 


211561 


863 


318013 region A3 336301 11 Forward Primer Seq 


336174 


336198 


864 


318013 region A3 336301 11 Reverse Primer Seq 


336353 


336329 


865 


318013 region A3 441603 14 Forward_Primer_Seq 


441539 


441563 


866 


318013 region A3 441603 14 Reverse Primer Seq 


441718 


441694 


867 


318013 region A3 468354 15 Forward_Primer_Seq 


468263 


468287 


868 


318013 region A3 468354 15 Reverse Primer Seq 


468442 


468418 


869 


318013 region A3 188983 18 Forward Primer Seq 


188855 


188879 


870 


318013 region A3 188983 18 Reverse Primer Seq 


189035 


189009 


871 


3 ISO 13 region A3 115502 17 Forward Primer Seq 


115469 


115493 


872 


318013 region A3 115502 17 Reverse_Primer_Seq 


115649 


115625 


873 


318013 region A3 163006 13 Forward_Primer„Seq 


162972 


162996 


874 


318013 region A3 163006 13 Reverse Primer Seq 


163153 


163129 


875 


318013 region A3 119283 14 Forward Primer Seq 


119199 


119224 


876 


318013 region A3 119283 14 Reverse Primer Seq 


119381 


119357 


877 


318013 region A3 491126 1 Uorward_Primer_Seq 


491062 


491086 


878 


318013 region A3 491126 11 Reverse_Primer_Seq 


491244 


491220 


879 


318013 region A3 99512 21 Forward Primer_Seq 


99398 


99422 


880 


318013 region A3 99512 21 Reverse Primer Seq 


99581 


99557 


881 


318013 region A3 280291 17_Forward_Primer_Seq 


280201 


280226 


882 


318013 region A3 280291 17_Reverse„Primer_Seq 


280385 


280361 


883 


318013 region A3 138443 19_Forward_Primer_Seq 


138304 


138329 


884 


318013 region A3 138443 19 Reverse Primer Seq 


138488 


138465 


885 


318013 region A3 115973 14 Forward_Primer_Seq 


115832 


115856 


886 


318013 region A3 115973 14 Reverse_Primer_Seq 


116016 


115992 


887 


318013 region A3 329977 14 Forward„Primer_Seq 


329864 


329889 


888 


318013 region A3 329977 14_Reverse_Primer_Seq 


330050 


330026 


889 


318013 region A3 205203 14 Forward_Primer_Seq 


205090 


ZUD J. i J 


890 


318013 region A3 205203 14 Reverse_Primer_Seq 


205276 


205252 


891 


318013 region A3 153114 12 Forward Primer Seq 


152969 


152993 


892 


318013 region A3 153114 12 Reverse Primer Seq 


153156 


153132 


893 


318013 region A3 34581 13 Forward Primer Seq 


34523 


34547 
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.^pn IViiiTi 


Sea ID 




fkfl mtltl9 Pflfi 




11 soil reaion Al 14S81 11 Rever^ie Primer Sen 

J X O X J J. t/glUXl_JT. J J*T J (J X _ X J XVV^ V Vp/X ot' X X XliXWX __VJ 




34688 


895 


nSOIl repion Al 909577 10 Forw;ird Primer Sen 
w' xovj'x^ xt'gxwii j—'^ I 1 xy x yjL wtixvx__jr ixxxxd_ocL[ 


292549 


202571 


896 


11 soil region Al ^92577 19 Reverse Primer Ren 


292739 


292715 


897 


11 soil reffion Al 44S101 90 Forward Primer Ren 


445356 


445382 


898 


318011 region Al 445391 20 Reverse Primer Sea 

•J X O V/ X ^ 1 t/g,XWli X XV^ V V^X X X XXXA^X VJ>>.f|^ 


445547 


445523 


899 


318013 redon A3 350540 17 Forward Primer Sea 


350421 


350445 


900 


318013 reffion A3 350540 17 Reverse Primer Sea 


350612 


350588 


901 


318013 resion A3 453879 15 Forward Primer Sea 


453725 


453750 


902 


318013 redon A3 453879 15 Reverse Primer Sea 


453918 


453894 


903 


118011 redon Al 90194ft 11 Forward PVimer Sen 

^ i.\y\j X t/^iv/ii r\.j X ■^'tvj xj x ux wcti vx x xxixit/X ot/U 


201 128 


201153 


904 


11 soil re<rinn Al 00)QAft 11 Reverse Primer Ren 


201321 


201297 




11 soil reoion Al 196090 11 Forward Primer Sen 

JXOV^X J ^1 Cg,XWll__zT. J JifVJW^Vy X J ^X \JX V v U-l VX X IXXXXti. 


125902 


325927 


906 


318013 redon A3 326020 13 Reverse Primer Sea 


326095 


326071 


907 


11 soil redon Al 501801 14 Forward Primer Sen 

•JXOVj/XiJ XC&XUil_x^J »JVyjOV/X Xt^ X KJi. W Cll LI X XXlXXt/X 




503680 


90S 


318013 redon A3 503801 14 Reverse Primer Sea 

^ X X ^ X ^^XV/IX J^,^ -Jxj^ \J\J X X I XVV^ V WX 0\^__X X XXXXV/X v_r^u 


503849 


503823 


909 


118011 redon Al 109400 59 Forward Primer Sen 

J X O \J X J X ^^HJlX ixO J \J Z^^\J\J ~J ^ X UX W ox \X X i XXlit/X o 


102981 


102107 




11 soil redon Al 109400 S9 Rever«:e Primer Sen 


109481 


109456 




11 soil reaion Al 4dSR57 15 Forward Primer Sen 


AA9nA% 


44S779 


019 


IIROII reaion Al 44RR57 15 Reverce Primer Sen 


44R047 


448094 


01^ 


nsnil reaion Al 4Rlft4 14 Forward Primer Sen 
J X 0\J X J ^lCgxUn_/\J) HO 1t-_FUI WarU_x IlxIlCl OCL| 


4S919 


4S95ft 


014 


11 soil recion Al 4Rlft4 14 Reverse Primer Sen 


4R415 


455419 


915 


llSmi reaion Al 951R04 4S Forw^^rd Primer Sen 


95171S 

ZjJ> 1 / jo 


9517ft9 
Z J X / uz 


01 ft 


J i o w i D I cg,iun__r\. J -it J X ou^_4-o_xxc vcrsc_x rinxer_oet| 


951 QJ.9 
>ZO xy'+Zf 


951Q1 8 
Zj xyxo 


017 
yi / 


11 soil rp'ainn Al lR955il 11 Fnrwiird Primer Sen 
J 1 o w 1 j_i egiun_/V3 J o z^D o j> 1 J_x^ ur w ciru._jr riiiicr_o ctj 


1R9540 


1S9574 


01 s 


llROll reaion Al 1S95S1 11 R**vf^rc(* Primer S(=n 


1)?9751 


1897952 
JOZ / zo 


010 


USD 11 reaion Al 194717 14 Forward Primer Sen 

D xO\J X D I. C^lL/n__jrU X Z/t / O i I *+__Jr'Ul Well U_x 1 iJllCl OCL| 


X ZH-UH- 1 


194ftft5 


090 


11 soil reaion Al 194717 14 Reverse Primer Sen 




194S99 


991 


11 soil reaion Al 1947ftft 11 Forward Primer Sen 

JxOWX J XC'^XUll_rt.J XZ,*+ / UU XD X KjL WoXQ ^JT ilXlJlCX OCl^ 


194ft4.1 


1 94ftft5 


999 


11 soil reaion Al 1947ftft 11 Reverse Pr\meT Sen 
J xowx J X t/^i\Jxi r\j x^'-r 1 X J JVC/ V C/i at t ixxxicx oc\^ 


1 94S4ft 


194899 

IZt OZ'Zr 


923 


118011 reaion Al 4ftll51 1ft Forward Primer Sen 

X 0\-/X J XCgHJXl 'T^JX X X\J ^X UX WtUU- X IXXXlCi OC-LJ 


4ft1918 


4ft 1949 


924 


118011 reaion Al 4ftll51 1ft Rever<;e Primer Sen 

— ' X X*J X^g,XV^ll Jri^J X J J X X \J V t-X &v/ Jt XXlXXC'X_OC\^ 


461426 


4ft 1409 


925 


118011 reaion Al ft4Q51 19 Forward Primer Sen 
jxowx^j 1 t/g.xvjii XT. J \j'-TyDj xy ux w cii \x x i. xxxn^i 


ft470R 


64891 


096 


11S011 reaion Al ft4951 10 Reverse Primer Sen 


6501 1 


ft4087 


997 


11 soil reaion Al lftft58ft 11 Forward Primer Sen 

J XO\J XJ C^lVJXl__xT^ J\JU»^OU XJ ^X UX WOXU- X XXiXlCX OCX} 


lftft508 


lftft519 


09 S 


llROll reaion Al lftft5Rft 11 Reverse Primer Sen 


J DO /ZZ- 


IftftftOSi 


990 


IIROII recion Al 4ft 100 15 Forward Primer Sen 

•J xO\J XJ ^1 CgXtJll_>rt. J '-r\JXy\J X*J 17 Ul VVul U ^x IIHICI OCtj 


dft019 


4ft017 


010 


11 soil rp'ainn Al J.ft100 IS "Ri^vp>rc<= Primf»r S^^n 
J X o\y X J jI o^nJii_s^j X y\j u_j\.c vex oc x jriiiiCx_octj 


4ft99R 


4ft905 


0'^1 
yo i 


nSOll reaion Al RIOIft 11 Forward Primer Sen 

J XOWi J 1 Cg,XUll_/T. J OIUXU X 1 ^JTUl Wax LI ^JT llliiCl OCLJ 


S0Q97 




019 


11 soil reaion Al RIOIft 11 Reverse Primer Sen 

J X O W X J X \D^l\JlL_r\J O X V X U X X _xvC V CI X I llXlCi O CL| 


R1 14ft 


R1 199 
0 1 IZZ 


oil 

yDD 


J X o\j X J ^1 cgiuii_/^o xjH^H-Zf\j x'+_r^orwdru_x^riixier_occj 




1 1A977 


014 
yj*f 


11 Rm 1 rPTrinn Al 1 lAA9ft ^A 'Qf^^r^-rcf^ Primp^r 

^ 1 oL^ 1 ^__re^ion_/v J i j'^zo_i ^_Ke vcrsc_Jrnmcr_ocC] 


1 "XAAIA 




OlS 


11 soil rp'ainn Al 9Q979A lA Fr»r\i;nrH Prim*=»r Qp^n 




909571 
ZVZ J / J 


01ft 


llROll r*»CTir»T» Al 90979^1 lA P<=>^7^<»rcf=» Primfar <5*an 

J iovy io_rcgiuii_/\o zyZt / zt- x'f_xvcvcrsc_x:^rimcr_ocCj 


700771 

zyz fix 


909747 


937 


1180I1 reaion Al 1R700ft 17 Forward Primer Sen 

■J X OVj' X _J X^^XMxx X O / \/yKJ X I X Ui WdX U- J^X XxXICll OCL| 


1 870^8 

X 0 / \JO 0 


1 S70S9 


938 


31801 3_region„A3_l 87096_1 7_Reverse_Primer_Seq 


187282 


187257 


939 


3 1 80 13_region_A3_38 1693_13_Forward„Primer_Seq 


381658 


381683 


940 


31801 3_region_A3_3 8 1 693_1 3_Re verse_Primer_Seq 


381885 


381863 


941 


318013„region_A3_361286_33_Forward_Primer_Seq 


361173 


361197 


942 


31801 3„region_A3 361 286_33_Reverse_Primer„Seq 


361401 


361376 
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IrifiitioT] of nrimer 
nn rnntiff pud 


943 


3180n region Al 482668 14 Forward Primer Sea 


482592 


482616 


944 


region A3 482668 14 Reverse Primer Sen 


482821 


482796 


945 


318013 region A3 128002 12 Forward Primer Sea 


127882 


127906 


946 


31801 3_region_A3 1 28002_ 1 2_Re verse_Primer_Seq 


128112 


128087 


947 


31801 3_.region_A3 ^499270_1 4_Forward_Primer_Seq 


499184 


499208 


948 


318013 resion A3 499270 14 Reverse Primer Sea 


499422 


499398 


949 


318013 redon A3 231650 12 Forward Primer Sea 

X V X X v^&xvyxx A j-v^ X- Vf^y v/ x. ^x TT MJL yx x x xxxxwx 


231568 


231592 


950 


318013 resion A3 231650 12 Reverse Primer Sea 

*p/ X ^ V-y X X wwXv/XX X w X V/a^ V/ X.^ -XVW V WX ijw A X XXXXWX K-^ 


231809 


231788 


951 


318013 region A3 199851 13 Forward Primer Sea 

^ X sJ X ^ X vc^Xv^i.X X ^^s^ JL ^ ^ \Jw^ X X ^ X WX VY tlx VX_ X X XXXXV/X VjVx^J 


199762 


199786 




318013 rf*(rion A3 199851 13 Reverse Primer Sen 


200012 


199988 




'^IRC)}'^ r^*ainr^ A'^ 394690 1 Forward Primpr *5pn 


^94540 


^94564 




31 SOI rpcrion A^ 394699 1'^ Rpverse Primer Sen 


324790 


394766 




31801^ recrinn A^ 374190 19 Forward Primer Sen 


374129 


374159 


956 


318013 region A3 374190 19 Reverse Primer Sea 


374394 


374370 


957 


318013 region A3 460603_13„Forward„Primer_Seq 


460450 


460474 


958 


31801 3„region_A3 460603_1 3_Reverse_Primer_Seq 


460715 


460691 


959 


318013 redon A3 108681 14 Forward Primer Sea 

-mj x^HJ^^x■^y Xwc^xv/xx l \.sJ x. __x^^t x tv ctx vx x xxxxxwx wv'Vi 


108524 


108548 


960 


318013 reffion A3 108681 14 Reverse Primer Sen 


108791 


108768 




^ISOl^ reaion A'^ 450701 47 Forward Primer Sen 


459639 


4^9663 




"^IROl^ rp'aion A'^ 4^0701 47 l?everce Primer Sen 


4^0007 


4^088^ 




'^ISni'^ reaion A'^ 49^7 90 Forwjirrl Primer Slen 


41 79 






'^ISDl'^ reaion A'^ 4957 90 Reverse Primer *sen 


44^0 


449 S 




'^ISm^ recrion Al 9'^SRIO 14 Forward Primer **len 




9^8S80 
Z/_) o~joy 


966 


31801^ recrinn A'^ O^RR^C\ 14 Reverse Primer Sen 


9'^88S0 


9^8896 


967 


31801'^ reaion A^ 945R17 14 Forward Primer Sen 


945713 


245738 


968 


318013 region A3 245817 14 Reverse Primer vSen 


246001 


245977 


969 


318013 reeion A3 245956 14 Forward Primer Sea 


245713 


245738 


Q7n 


318013 reoion A3 9459S6 14 Reverse Primer Sen 


946001 


945977 


Q71 


31801^ reaion A'^ 74148 14 Forward Vr\mer Sen 


740S0 


74075 


y f 1^ 


'^1801'^ rf^aion A^ 74148 14 Revf=r«(^ Primer Sen 


741^8 


74^14 

/ *+ J X*T 




^IROT^ reaion A'^ 7408Q 15 Forwarr! Primf*r Sen 


740^0 


7407^ 


Q74 

y I'-r 


'^1801^ reaion A^ 7408Q IS Reverse Primer Sen 
-7 X o w X D X Cgiu xi_/TiO / w o y i J .XV c vex OC X X IXXXCX O OL| 


74 '^'^ 8 


74^14 


97^ 


'^1801^ reaion A'^ 941 /S8/^ 19 Forw;ird Primer Sen 


941470 


941404 


976 


'^1801'^ reaion A'^ 941fi8fi 19 Reverse Primer Sen 

J X O W X D ^lCglVJXX_rt^ Ai'-r X \jO\J X ji__iVC V Ci x 1 XXXXCl OC^ 


94.1 76S 


941741 


977 


318013 reaion A3 47476 1? Forward Primer vSen 

■J X \J\J X XKf^lXJiX ^ i ^ I \J Xi-i J. \Jx Well \X X XXXXXt'X vJ^LJ 


47280 


47304 


978 


318013 reaion A3 47476 197 Reverse Primer Sen 

•J X \J\J X J X Vg,XWXX zT.fcJ r / *T / \J X ^ / _Xxt' V v^X ftt/ X X XXXX^X 


47577 


47554 


979 


318013 reaion A3 164550 19 Forward Primer Sen 


164323 


164347 


980 


318013 reaion A3 164550 19 Reverse Primer Sen 

D xovy X J 1 ^g.x*jxx, r\j xu'T^^.JV/ x ^ xvc v t/X at/ x xxxxi^x ot/L[ 


164691 


1 64598 




31801^^ reaion A3 101955 15 Forward Primer Sen 
J X o vy X J ^x c^^xuxi rvj x \j x u x »j x \ji w cxi u. x x xxxxcx vjcl| 


101 1 19 

XVX XX-? 


101 144 

xw X xtT- 


Q9.0 


'^1801'? reaion A'^ 101 9SS IS Peverce Primer Sen 


10141 8 


1 01 ^09 

x\jxDy4/ 




515009 reaion 09 1618Q 11 Forwnrr! Primer 


1 6144 


1 61 68 




S1SO09 rf*aion CV^ 161SQ 11 'Qf^xTt^vQf^ Primf^r 


1 6944 


1 6990 




S1SO09 r^^crion CY) 7109S Fnrwrnrrl Prim^^r 

J xD\J\ji^ region ^jf^ / lyz-o ij__r^orwdiu._r rimer 


71 R80 


71 OO^ 


986 


515002 region G2 71925 13 Reverse Primer 


71987 


71963 


987 


5 15O02_region_G2_4707_12_Forward_Primer 


4660 


4684 


988 


5i5O02_region_G2_4707_12Reverse_Primer 


4769 


4743 


989 


5 15O02„regioii_G2_l 1 8904_1 8_Forward_Primer 


118847 


118871 


990 


5 1 5O02_region_G2_ 1 1 8904_1 8„Reverse_Primer 


118957 


118932 


991 


515O02_region_G2_13655_i7_Forward_Primer 


13567 


13592 
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Sea Niim 


Seq ID 


location of Drimer 
on contig start 


location of primer 
on contig end 


992 


515O02_region_G2 13655_17_Reverse_Primer 


13698 


13673 


993 


5 15O02_region_G2 53900_1 3_Forward_Priiner 


53843 


53867 


994 


5 1 5O02_region_G2 53900_ 1 3_Reverse_Primer 


53985 


53961 


995 


515O02_region_G2_8079_14_Forward__Primer 


8023 


8047 


996 


515002 resion G2 8079 14 Reverse Primer 


8167 


8143 


QQ7 


51SO09 rpcrinn (19 0069 9f^ Forward Primer 


9917 


9941 


yyo 


S1SO09 refyinn G9 00<^0 9R Reverse Primer 


10062 


10038 


yyy 


S 1^009 reairtn G9 79^^08 77 Forward Primer 


72272 


72298 


1000 

L\J\J\J 


515009 region G9 7H08 77 Reverse Primer 


10062 


10038 


1001 


515002 reeion G2 99475 19 Forward Primer 


99408 


99433 


1002 


5I5O02 redon G2 99475 19 Reverse Primer 


99554 


99530 


1003 


515O02_region_G2 118615_18_Forward_Primer 


118512 


118535 


1004 


5 1 5O02_region_G2_l 1 8615„1 8_Reverse_Primer 


118658 


118634 


1005 


515O02_region_G2_119001__46_Forward_Primer 


118931 


118956 


1006 


5 15O02_region_G2_l 19001_46_Reverse_Primer 


1 19079 


119055 


1007 


515002 redon G2 118958 43 Forward Primer 


118931 


118956 


\-\J\JO 


515009 re pi on G9 11R95R 4^ Reverse Primer 


119079 


119055 


lOOQ 

x.\j\jy 


51SO09 rf^ainn rr9 17107 l'^ Forward Primer 
J X Jv-zv/ji ^iCg|iuix__vJ^ X 1 xy I xo X ^jx wculi ixxix^x 


17128 


17152 




SlSn09 rpoion CVJ 17107 1'^ Reverse Primer 


17276 


17252 


1 01 1 


S 1^009 rf^crion CVJ lOSIf^'^ 90 Forward Primer 

J 1 J WWZf ^1 C^lUJll_vJZ< J,UJ XUJ ^y JCKjl woiu ^rxxxxxcx 


105068 
x\ju yjyjo 


105092 


1019 


51SO09 region CD 10516^ 90 Reverse Primer 
»j u X v^^xvjxx VJZ. X \j J X \j J ji;7_x\.t v t-i oc- x xxxxc-x 


105217 


105192 


i,\J 1.J 


51^009 reoion Cr? 111^^5 1^ Forward Primer 


111308 


111332 


1014 


515009 region G2 1113^5 13 Reverse Primer 


111458 


111434 


1015 


515002 region G2 106396 13 Forward Primer 

^ X^^\^\J^ XvC^X^^XX X. \J\J^ ^ \J X 1^ X V/X VV tXX VX- _X X XXXXvX 


106318 


106342 


1016 


515O02 reeton G2 106396 13 Reverse Primer 


106469 


106445 


1017 


515O09 recrion G9 50990 17 Forward Primer 


59203 


59227 


101R 


515009 reaion G9 S0990 17 Reverse Primer 
J xj\j\jj^ iv^xtjxx V-i<-- ^y^^y x / rvcvt^xac ^x xxxxx^x 


59354 


59330 




^1^009 reaion G9 7^70S 90 Forward Primer 


73769 
I J 1 yjy 


73793 

/ J / yj 


1090 


S 1 Sn09 reaion G9 7'^79S 90 Reverse Primer 

J xO\J\jLi XCgXUll_VJ^ / D I yO iJyJ .xVt> V CX Ot/ x XXXXXtX 


73991 


73896 


1091 


S 1 Sn09 reaion G9 90 PorwarH Primer 

J X^\J\}Zj XCgXVJXl VJ^ OJxJU't £~\J_x\Jx WqXU. XTXiXXXtfX 




85611 


1022 


515009 reaion G9 8*i664 20 Reverse Primer 

J A ^ \J yj ^ X g X W XX vJ jLr \jO\J ZAJ Xvv- V tx X X X XXX^l 


85738 


85714 


1023 


515002 region G2 36921 17 Forward Primer 

— ' X*/>— l.\^^l\JKV VJji,/ .jyjjf^X X I J. \JX W tXX KX X X XXJ.XW1 


36830 


36854 


1024 


5 15Q02 region G2 36921 1 7_Reverse_Primer 


36983 


36959 


1025 


515002 reeion G2 124150 19 Forward Primer 

X*>y V^WArf _X V/^Xy/XX VJiO X X \J x. V-/X vt txx Vi. X. X XXXXV/X 


124073 


124096 


1026 


515002 region G2 124150 19 Reverse Primer 


124227 


124203 


1027 


5I5O02 region G2 5089 14 Forward Primer 


4999 


5024 


1028 


515002 reaion G2 5089 14 Reverse Primer 


5156 


5132 


102Q 

xyjj^y 


S15O09 reaion G9 5R921 15 Forward Primer 

^ X^\J\J£j XC-gXVJXl VJ^ ^OZ/^X x^ ^X VJX WCIXU. X xxxxx^x 


58197 

xy t 


58220 


10^0 


515009 reaion G9 5R991 15 Reverse Primer 

^ XJ Wv/Zi ^XCgxUXi vJjI^ ^O^^ix X J_XVt'VdoC X llXii^X 


58354 


58330 




J I J wuz_rcgiun_vjz ^^o i oy_ i _r^ur w oi ci_x liiiicr 


06099 


06046 


1 0^9 


uwvjz- rcgiun vjrz i iH_Jtvcvcroc_r^riiiicr 


061 R9 


061 5R 

y\j x^o 


1.\JDD 


515009 rpaion G9 70505 1^ PorwarH Primer 


70479 


70406 


10'^4 


515009 reaion G9 70595 13 Reverse Primer 
-J Xij\j\jLi it/g,xvjxx iKJ^y^j X >j x\\> V t>x Bt/ xixxi^x 


70634 


70608 


1035 


515O02_region_G2_4340„15_Forward_Primer 


4312 


4337 


1036 


515O02_region_G2_4340_15__Reverse_Primer 


4477 


4454 


1037 


5 15O02_region_G2_904 17_1 l_Forward_Primer 


90335 


90359 


1038 


5 15O02„region_G2_„904 1 7_1 l_Reverse„Primer 


90503 


90479 


1039 


515O02_region„G2_49711_17_Forward_Primer 


49652 


49676 


1040 


5 15O02„region_G2_497 1 l_17_Reverse_Primer 


49820 


49796 
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^kPA IViiin 




iUl^aiiilii ill JJl Illici 

Ull WULAg 9 I'd! !> 


lAr>Q^irkii /if m*iTinAt* 
lUCaLlUil Ui IJi Illl^i 


1041 


SI son? reainn G9 6^05^ 1"^ Frirward Primer 




63029 


1 OdO 


SI son? r^^oirtn ^^'^^S'^ 1 Pf^vp^rcp^ Prim^»r 
J 1 *J ICftlUH VJZf \jD\jOD X v?_XvC VCl oC iTi. illlCl 


^^'^17'^ 

O J LID 


6^ IIS? 

UO IH'O 




SISOnO rAoinn f^O (^"^Cnfs. ^A Prvrwnrrl Primer 




f»^09Q 




0 region \jz OjU/d i4 j\& verse rnniGr 


ODl Id 


A^l Aft 
OD 140 




0 1 D vjuz region vjz 4444Z i orwara_r^nnier 


44oDj 


AA'^SQ 
44Jjy 


1U40 


J i J uuz region ltz 4444z iz Keverse rnnier 


44DUJ 


AAASl 
44451 


iU4/ 


J 1 o uuz_region_Lrz 444 zz_ 1 y _r or warQ_r rimer 


440 DD 


44DDy 


iU4o 


3 1 D uuz region oz 444zz i y_Jtveverse_rr iiner 


A/I CAC 

44DUD 


AAASil 
44461 




J 1 0 uuz region vjz 44 1 j o__ i y _ror wara_irrinier 


44U ID 


441UU 




^I^IOnO rAfTir\n C^O AA^ 10 PA\r<aroia Primer 

310 vjuz region vJz 44 1 j o_ 1 y _j\.e v crac_Jrrinier 


44ZJZ 


44ZZ/ 




J 1 jv^uz_regioii_oz___4H-i'+i_i /_r'orwdru__r^riuier 


4^U / J 


441UO 




3 ijuuz region ijz 44 i4i_ i / _K.everse_x rimer 


AA9S9 
44ZDZ 


AA997 
44ZZ/ 




D iDVJuz__region_LTZ yu / oz_i / or wara^r rimer 


yuoD / 


yUODD 




j iovjuz_region_oz yu /dz_i /__Keverse_r rimer 


onsi A 
yuoi4 


yu/yu 


1U03 


J 1 J uuz__region_oz iudz4 i_i 4_r orwara_rrimer 


iUDlOU 


1 n^i ftA 

lUDlo4 


JUDO 


J 1 J uuz region vjz i ut)Z4 1_ 1 4_Keverse„rTimer 


1U0J41 


1 A<'1 1 7 
lUDOi / 




j 1 juuz_region_LrZ luyo /u_iz_r' or war a_r rimer 


1 AQi!;AQ 

iuyouy 


iuyDo4 




J ID UUZ region \jl luyo/o iz Keverse r rimer 


1 AQ70'2 

lUy (yD 


luy /DO 


1 ncQ 


DID uuz__region_LrZ ooZ4Z_ 1 4_r or war ci_r rimer 


<5Dl34 


Q/:i CO 


lUOU 


D i D uuz_region_Ljz odz4Z_ 1 4_Keverse_r rimer 




oozyo 


iUOi 


DID \ Juz_regio n„Lrz o J 1 uy _ i z_r or war u_r rimer 


OjUI / 


oDU4i 


iUOZ 


D iDUUZ_region_oz od iuy_iz_Keverse_r rimer 


ojZUZ 


oOl /o 


1 C^f.'X 


D 1 D u uz__region_oz i U4o i _ i d _r or wara^r^rimer 


1 A/f 1 Q 

lU41o 


1U44Z 


1Ud4 


D 1 D uuz_r egion__ijZ i U4 o i _ i d _Keverse__r rimer 


1 Af;AO 

luouy 


lUDcSD 


iUOD 


DID uuz region (jZ o / c)Uo_ l D_r or war a_r rimer 


o/dDZ 


D/D / / 


1 r\/c^ 
IUdd 


D 1 D UUZ_region_Cj2 o / oUo_ i D_Keverse_Frimer 


0/ /4D 


£.nno 1 
0/ /Zl 


1 fiAT 
lUO/ 


D iDUUZ_region_vjrZ d3Z /D_4u_rorwara_rrimer 


DO 140 


ool /o 


iUoo 


D iDUUZ_region_LTZ ooz / D_4D_Keverse_r rimer 


/i'2'2/17 

03o4/ 


zrooO'i 

oooZo 


iuoy 


DID uuz_region_uz oz4UD_ 1 4_r orwara_j:^rimer 


OZO /4 


/TOO AA 


1 n7n 
lU /u 


^1^r/^A9 -t-iarrii^n (~XO ^9/(A^ 1/1 P £n rck*-0£i Pfiiv\ia*- 

D 1 Duuz_region_uz dz4UD_ i4„Keverse_r^rimer 


OZD /D 


/;9<*jo 
oZddZ 


1 071 
iU / 1 


D 1 D uuz_region_uz d d d dd_ i z_r or warQ_r^rimer 


OD4DU 


0 J4o4 


1079 


0 1 J wuz_region_oz d j d o j)_ i z_iveverse_x rimer 


OOO /U 


J J 040 


\(YTX 
iXj I D 


DiDuuz region uz jj i40_i4_rorwarci_r rimer 


Dzy4y 


Dzy /o 


lU/4 


D iDUUZ_region_uz dd i4o_i4_Keverse_r rimer 


ojiyi 


OO 1 ii:'? 

JOlO/ 




DID UUZ_region_CjZ 1 UZ 1 / y _Zy_r orwara_rrimer 


1 AO 1 AO 

lUZiUZ 


1 AO 10^ 

lUZlzo 


lU/O 


D iDUUZ_region„(jZ iUzl /y_zy_Keverse_r rimer 


1 AO'S CO 

iUZdjz 


1 AO 'SOT 

102327 


lU/ / 


D iDUUz region uz zo4o 1 D_r orwara_rrimer 


2553 


2577 


lU/cS 


DID UUZ__region_uZ zd4d_ 1 D_Keverse_r rimer 


OOAA 

zoOy 


000 /I 
27 o4 


1 n7Q 

iu/y 


D 1 DUUZ_region_CjZ / ooDZ_Z4_r'orwara_rTimer 


/oDo/ 


T^rc A1 


lUoU 


D 1 D uuz_region_vjZ / odd z_Z4_Keverse_r rimer 


/DooD 


TAO 1 0 


1 nsi 

lUol 


DID uuz_,region_uz o dzou_ 1 4_r or wara__r rimer 


OOUDZ 


DDU/ / 


iUcSZ 


D ID uuz_region_uz oozou_i4_Keverse_rTimer 


^fJl.'^A 
OOJo4 


£/;onQ 

oDouy 


iUo:? 


D 1 D uuz_region_vjz D4 / oo__i o_rorwara__rTimer 


D4D4U 


D4DDO 




o 1 ji^uz region vjz / Oo__ i j_ive verse__rnmer 


D4yzD 


D4oyy 


1085 


515O02_region_G2_62580_14_Forward_Primer 


62552 


62576 


1086 


515O02_region_G2_62580„14_Reverse_Primer 


62840 


62816 


1087 


5 15O02_region_G2_34598_55_Forward_Primer 


34473 


34497 


1088 


515O02„region_G2_34598_55_Reverse_Primer 


34765 


34739 


1089 


515O02_region_G2_77680_13_Forward_Primer 


77444 


77470 
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Gpn in 


Sen TD 


lU^aLlUIl VIJL UliLUCl 


lUWctUUil Ul. JJliliiCl. 

nn pnntiff pnrl 




SI son? rptrinn G7 77fSS0 H Revpr<ip Primer 


77741 


77716 


1091 


51SO02 region G2 7769^ 12 Forward Primer 


77444 


77470 


1092 


515O02_region_G2_77693_12_Reverse_Primer 


77741 


77716 


1093 


515O02_region_G2_97392_14„Forward„Primer 


97255 


97280 


1094 


515O02_region_G2_97392„14_Reverse„Prinier 


97554 


97530 


1095 


5 15O02_region_G2_97359„15_Forward_Primer 


97255 


97280 


1096 


515O02_region_G2_97359_15_Reverse_Primer 


97554 


97530 



Seq Num 


Seq ID 






1120 


consensusLRR 






1121 


rhglLRR 






1122 


Rhg4LRR 








Seq Num 


Seq ID 


Primer location on 240O17„region_G3 




1123 


2400 17_region_G3_forward_l_b 


45046-45072 





DETAILED DESCRIPTION OF THE INVENTION 

A) rhg\ 

The present invention provides a method for the production of a soybean plant having an 
rhgl SCN resistant allele comprising: (A) crossing a first soybean plant having an rhg\ SCN 
resistant allele with a second soybean plant having an rhg\ SCN sensitive allele to produce a 
segregating population; (B) screening the segregating population for a member having an rhg\ 
SCN resistant allele with a first nucleic acid molecule capable of specifically hybridizing to 
hnkage group G, wherein the first nucleic acid molecule specifically hybridizes to a second 
nucleic acid molecule that is linked to the rhg\ SCN resistant allele; and, (C) selecting the 
member for further crossing and selection. 

rhgi is located on linkage group G (Concibido etal.. Crop ScL 55:1643-1650 (1996)). 
SCN resistant alleles of rhgl provide partial resistance to SCN races 1, 2, 3, 5, 6, and 14 
(Concibido etal (Crop Set 57:258-264 (1997)). Also, Webb (U.S. Patent 5,491,081) reports 
that a QTL on linkage group G (rhgl) provides partial resistance to SCN races 1, 2, 3, 5, and 14. 
rhgl and Rhg4 provide complete or nearly complete resistance to SCN race 3 (U.S. Patent 
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5,491,081), While initially thought to be a recessive gene, rhgl classification as a recessive gene 
has been questioned. 

Using bioinformatic approaches, the rhgl coding region is predicted to contain either 
four exons {rhgh v.i)(coding coordinates 45163-45314, 45450-45509, 46941-48763, and 48975- 
5 49573 of SEQ ID NO: 2) or two exons (rhgl, v.2) (coding coordinates 46798-48763 and 48975- 
49573 of SEQ ID NO: 3). rhgl, vJ encodes an 877 amino acid polypeptide, rhgl, v.2 encodes 
an 854 amino acid length polypeptide, rhgl codes for a Xa21-like receptor kinase (SEQ ID 
NOs: 1097, 1098, and 1100-1115) (Song, etal,. Science 270, 1804-1806 (1995)). rhgl has an 
extracellular leucine rich repeat (LRR) domain (rhgl, vJ, SEQ ID NO: 1097, residues 164-457; 
10 rhgl, v2, SEQ ID NO: 1098, residues 141-434), a transmembrane domain (rhgl, v.i, SEQ ID 
43 NO: 1097, residues 508-530; rhgl, v.2, SEQ ID NO: 1098, residues 33-51, and 485-507) and 
01 serine/threonine protein kinase (STK) domain (rhgl, vA, SEQ ID NO: 1097, residues 578-869; 
ffl rhgl, v.2, SEQ ID NO: 1098, residues 555-846), In a preferred embodiment, the LRR domain 

has multiple LRR repeats. In a more preferred embodiment, the LRR domain has 12 LRR 
14 repeats. 

;U To identify proteins similar to the proteins encoded by rhgl candidates, database searches 

z2 are performed using the predicted peptide sequences. The rhgl candidate shows similarity to 
CAA18124, which is the Arabidopsis putative receptor kinase (58.4% similarity and 35.8% 
identity, (CLUSTALW (default parameters), Thompson et a/., Nucleic Acids Res. 22:4673-4680 

20 (1994)), GCG package, Genetics Computer Group, Madison, Wisconsin), and the apple leucine- 
rich receptor-like protein kinase (g3641252) (53.2% similarity and 31.5% identity, 
(CLUSTALW (default parameters))), which has both LLR and STK domains, showing 
conservation in both the LLR and STK domains. The predicted LRR extracellular domain shows 
similarity to the tomato resistance genes Cf-2,1 (Lycopersicon pimpinellifolium) (66.9% 

25 similarity and 45.4% identity (CLUSTALW (default parameters))) and Cf22 (Lycopersicon 
pimpinellifolium) (66.9% similarity and 45.4% identity (CLUSTALW (default parameters))). 
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Figure 1 is an alignment of the LRR domain of the rhgl gene, A consensus sequence for 
the LRR is shown as the top row of the alignment. Each row of amino acids represents an LRR 
domain. The boxed region indicates the putative p-tum/ p-sheet structural motif postulated to be 
involved in ligand binding (Jones and Jones, Adv, Bot. Res, Incorp, Adv, Plant Path. 24;89-167 
5 (1997)). The hydrophobic leucine residues are thought to project into the core of the protein 
while the flanking amino acids are thought to be solvent exposed where they may interact with 
the ligand (Kobe and Deisenhofer, Nature 374; 183-186 (1995)). Non-conservative changes in 
this region are thought to affect folding. An "x" represents an arbitrary amino acid while an "a" 
represents a hydrophobic residue (leucine, isoleucine, methionine, valine, or phenylalanine). 

10 Amino acid substitutions between resistant and sensitive phenotypes are bordered by a double 
Une. The amino acid substitution within the 302-325 region is a histidine/asparagine 

111 substitution, and the amino acid substitution within the 422-445 region is a phenylalanine/serine 

iy3 substitution. 

M As used herein, a naturally occurring rhgl allele is any allele that encodes for a protein 

^ having an extracellular LRR, a transmembrane domain, and STK domain where the naturally 
p occurring allele is present on linkage group G and where certain rhgl alleles, but not all rhgl 

alleles, are capable of providing or contributing to resistance or partial resistance to a race of 
^"^ SCN. It is understood that such an allele can, using for example methods disclosed herein, be 

manipulated so that the nucleic acid molecule encoding the protein is no longer present on 
20 linkage group G. It is also understood that such an allele can, using for example methods 

disclosed herein, be manipulated so that the nucleic acid molecule sequence is altered. 

As used herein, an rhgl SCN resistant allele is any rhgl allele where that allele alone or 

in combination with other SCN resistant alleles present in the plant, such as an RhgA SCN 

resistant allele, provides resistance to a race of SCN, and that resistance is due, at least in part, to 
25 the genetic contribution of the rhgl allele, 

SCN resistance or partial resistance is determined by a comparison of the plant in 

question with a known SCN sensitive host, Lee 74, according to the method set forth in Schmitt, 

42 



04983.0216,NPUS01/38-10(15810)B 



/. NematoL 20:392-395 (1988). As used herein, resistance to a particular race of SCN is defined 
as having less than 10% of cyst development relative to the SCN sensitive host Lee 74. 
Moreover, as used herein, partial resistance to a particular race of SCN is defined as having more 
than 10% but less than 75% of cyst development relative to the SCN sensitive host Lee 74. 
5 Any soybean plant having an rhgl SCN resistant allele can be used in conjunction with 

the present invention. Soybeans with known rhgl SCN resistant alleles can be used. Such 
soybeans include but are not limited to PI548402 (Peking), PI200499, A2869, Jack, A2069, 
PI209332 (No:4), PI404166 (Krasnoaarmejkaja), PI404198 (Sun huan do), PI437654 (Er-hej- 
jan), PI438489 (Chiquita), PI507354 (Tokei 421), PI548655 (Forrest), PI548988 (Pickett), 
10 PI84751, PI437654, PI40792, Pyramid, Nathan, AG2201, A3469, AG3901, A3904, AG4301, 
# AG4401, AG4501, AG4601, PION9492, PI88788, Dyer, Custer, Manokin, and Doles. In a 
m preferred aspect, the soybean plant having an rhgl SCN resistant allele is an rhgl haplotype 2 
n1 allele. Examples of soybeans with an rhgl haplotype 2 allele are PI548402 (Peking), PI404166 
y (Krasnoaarmejkaja), H404198 (Sun huan do), PI437654 (Er-hej^an), PI438489 (Chiquita), 
y PI507354 (Tokei 421), PI548655 (Foirest), PI548988 (Pickett), PI84751, PI437654, and 

PI40792. In addition, using the methods or agents of the present invention, soybeans and wild 
%: relative of soybean such as Glycine soja cm be screened for the presence of rhgl SCN resistant 
alleles. 

Any soybean plant having an rhgl SCN sensitive allele can be used in conjunction with 
20 the present invention. Such soybeans include A3244, A2833, AG3001, Williams, Will, A2704, 
Noir, DK23-51, Lee 74, Essex, Minsoy, A1923, and Hutcheson. In a preferred aspect, the 
soybean plant having an rhgl SCN sensitive allele is an rhgl A3244 allele. In addition, using 
the methods or agents of the present invention, soybeans and wild relatives of soybean such as 
Glycine soja can be screened for the presence of rhgl SCN sensitive alleles. 
25 Table 2, below, is a table showing single nucleotide polymorphisms (SNPs) and 

insertions/deletions (INDEL) sites for eight haplotype sequences of rhgl, 
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TABLE 2 



Identification 



Base number of contig 240O17_region_G3 of reference line A3244 



Hap 


Pl# 


Line 


Ph 


45173 


45309 


45400 


45416 


45439 


45611 


45916 


45958 


46049 


46113 


46227 


46703 


47057 


47140 


47208 


1 


" 


A3244 


S 


G 


G 


A 


T 


A 


A 


A 


A 


C 


A 


d1 


0 


T 


C 


C 


2 


PI548402 


Peking 


R 


G 


A 


C 


C 


T 


A 


G 


A 


T 


G 


0 


d2 


C 


C 


C 


3 


PI423871 


Toyosuzu 


-J 


G 


A 


A 


T 


A 


A 


G 


A 


T 


Q 


0 


0 


T 


c 


c 


4 


PIS 18672 


Will 


S 


G 


G 


A 


T 


A 


A 


A 


A 


C 


A 


d1 


0 


T 


A 


T 


5 




A2704 


s 


G 


G 


A 


T 


A 


A 


A 


A 


C 


A 


d1 


0 


T 


A 


T 


6 


PI290136 


Noir 


s 


A 


A 


A 


C 


T 


G 


A 


T 


T 


A 


0 


d14 


T 


C 


C 


7 


PI548658 


Lee 74 


s 


A 


A 


A 


C 


T 


G 


A 


T 


T 


A 


0 


d14 


T 


c 


c 


8 


P1200499 




R 


G 


A 


A 


c 


A 


A 


A 


A 


T 


A 


0 


d14 


T 


c 


c 


N/A 


PI548667 


Essex 


S 


A 


A 


A 


c 


T 


G 


A 


T 


T 


A 


0 


d14 


T 


c 


c 


N/A 


P1548389 


Minsoy 


S 


G 


G 


A 


T 


A 


A 


A 


A 


C 


A 


d1 


0 


T 


A 


T 


N/A 


PI360843 


Oshima. 


























0 


T 


A 


T 


N/A 




A2869 


R 






















0 


d14 


T 


C 


0 


N/A 


PI540556 


Jack 


R 
































N/A 




A2069 


R 
































N/A 


P1209332 


No.4 


R 

































TABLE 2, continued 



Identification 


Base number of contig 240O17_reglon_G3 of reference line A3244 


Hap 


Pl# 


Line 


Ph 


47571 


47617 


47796 


47856 


47937 


48012 


48060 


48073 


48135 


48279 


48413 


48681 


48881 


49012 


49316 


1 




A3244 


S 


G 


C 


A 


T 


T 


T 


c 


C 


A 


C 


G 


A 


0 


A 


T 


2 


PI548402 


Peking 


R 


G 


C 


C 


C 


C 


T 


c 


C 


G 


C 


G 


G 


d19 


G 


T 


3 


P1423871 


Toyosuzu 




G 


c 


C 


C 


C 


T 


c 


C 


G 


C 


G 


A 


0 


A 


T 


4 


PI518672 


Will 


S 


G 


c 


A 


T 


T 


T 


c 


C 


A 


C 


G 


A 


0 


A 


T 


5 




A2704 


S 


G 


c 


A 


T 


T 


C 


T 


T 


G 


T 


C 




0 


G 


C 


6 


P1290136 


Noir 


S 


A 


A 


C 


C 


C 


C 


T 


T 


G 


T 


C 


G 


0 


G 


C 


7 


PI548658 


Lee 74 


s 


G 


A 


C 


C 


c 


c 


T 


T 


G 


T 


C 


G 


0 


G 


c 


8 


P1200499 




R 


G 


A 


C 


c 


c 


c 


T 


T 


G 


T 


C 


G 


0 


G 


c 


N/A 


PI548667 


Essex 


S 


G 


A 


C 


c 


c 


c 


T 


T 


G 


T 


C 


A/G 


0 


G 


c 
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N/A 


P1548389 


Minsoy 


S 


G 


C 


A 


T 


T 


CJT 


C/T 


C/T 


A 


C 


G 


A 


0 


A 


T 


N/A 


P1360843 


Oshimas. 




G 


C 


A 


T/C 


T/C 


T 


C 


C 


/VG 


C 


G 


A 


0 


A 


T 


N/A 




A2869 


R 


G 


A 


C 


C 


C 


C 


T 


T 


G 


T 


C 


G 


0 


G 


C 


N/A 


P1540556 


Jack 


R 






C 


C 


c 


C 


T 


T 


G 


T 


C 


G 


0 


G 


C 


N/A 




A2069 


R 






C 


T/C 


T/C 


C 


T 


T 


A 


T 


c 


G 


0 


G 


c 


N/A 


PI209332 


No.4 


R 






C 


C 


G 


C 


T 


T 


A/G 


T 


c 


G 


0 


G 


c 



In Table 2, discrete haplotypes are designated 1 through 8. N/A refers to a haplotype that 



is not characterized. The Plant Introduction classification number is indicated in the "PI#" 
column. A dash indicates that no PI number is known or assigned for the line under 
investigation. The Une from which the sequences are derived is indicated in the "line" column, 
5 with a dash indicating an unknown or unnamed line. The "Ph.'* (phenotype) column of table 2 
5 indicates whether a given line has been reported as resistant (R) to at least one race of SCN or 
si sensitive (S). 

?^ The nucleotide bases located at each of 30 positions in each of the haplotype sequences is 

f^l shown in the columns labeled "Base number of contig 240O17_region_G3 of reference line 
10 A3244." The base number at the top of each column corresponds to the base number in contig 
Jf 240O17_region„G3 of reference line A3224 (SEQ ID NOs: 2 and 3). The letters G, A, C, and T 
111 correspond to the bases guanine, adenine, cytosine, and thymine. Two bases separated by a slash 
M: (A/G, C/T, or T/C) indicate uncertainty at the specified position of the haplotype sequence. A 

"d" followed by a number indicates a deletion of a the length specified. That is, dl is a one base 
15 deletion, d2 is a two base deletion, dl4 is a fourteen base deletion, and d 19 is a nineteen base 

deletion. A zero (0) indicates no deletion. A dash indicates that the identity of the base is 

undetermined. 

Examination of table 2 reveals that the amino acid substitutions in the rhgl coding region 
are common to the resistant lines PI467312 (Cha-mo-shi-dou), PI88788 and the southern 
20 susceptible lines Essex, Hutchenson, Noir and A 1923. As used herein, a "southern" cultivar is 
any cultivar from maturity groups VI, VII, VIII, IX, or X, and a "northern" cultivar is any 
cultivar from maturity groups 000, 00, 0, 1, II, III, IV, or V. This data is consistent with the 
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mapping experiments of Qui et ah (TheorAppl Genet 9S:356-364 (1999)). Based on analysis of 
200 F2;3 families derived from a cross between Peking and Essex, the authors failed to detect any 
significant association with SCN resistance to races 1, 2, and 3, and the rhgl locus on linkage 
group G. The authors point out that one of the markers, Bngl22, which has been shown to have 
significant linkage to rhgl (Concibido et al. Crop Sci, 55:1643-1650 (1996)), is not 
polymorphic in the population employed. It is also possible that the susceptible southern lines 
contain rhgl and the susceptible phenotype reflects the polygenic nature of SCN resistance. In a 
study to uncover QTLs for sudden death syndrome (SDS) in soybean, two SCN resistant alleles 
originating from the susceptible parent Essex have been described (Hnetkovsky et at, Crop ScL 
55:393-400). 

Tables 3a, 3b, and 3c, below, show lines that share an rhgl haplotype. 



TABLE 3a 



Haplotype 2 Lines 


Pl# 


Line 


Ph. 


PI548402 


Peking 


R 


PI404166 


Krasnoaarmejkaja 


R 


PI404198 


(Sun huan do) 


R 


PI437654 


Er-hej-jan 


R 


P1438489 


(Chiquita) 


R 


P1507354 


Tokei 421 


R 


PI548655 


Forrest 


R 


PI548988 


Pickett 


R 


PI84751 




R 


PI437654 




R 


PI40792 






TABLE 3b 


Haplotype 4 Lines 


Pl# 


Line 


Ph. 




Will 


S 


Pi467312 


Cha-mo-shi-dou 


R 


PI88788 




R 
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TABLE 3c 



Haplotype 6 Lines 


Pl# 


Line 


Ph. 




Noir 


S 




A1923 


S 




Hutcheson 


s 



In Tables 3a, 3b, and 3c, Plant Introduction classification number is indicated in the 
"PI#" column. A dash indicates that no PI number is known or assigned for the line in question. 
The line from which the sequences are derived is indicated in the "line" column, with a dash 
indicating an unknown or unnamed line. The "Ph." column indicates whether a given line has 
been reported as resistant (R) to at least one race of SCN or sensitive (S), with a dash indicating 
that the phenotype is unknown. 

In a preferred aspect, the source of either an rhgl SCN sensitive allele or an rhgl SCN 
resistant allele, or more preferably both, is an elite plant. An "elite line" is any line that has 
resulted from breeding and selection for superior agronomic performance. Examples of elite 
lines are lines that are commercially available to farmers or soybean breeders such as HARTZ™ 
variety H4994, HARTZ™ variety H5218, HARTZ™ variety H5350, HARTZ™ variety H5545, 
HARTZ™ variety H5050, HARTZ™ variety H5454, HARTZ™ variety H5233, HARTZ™ 
variety H5488, HARTZ™ variety HLA572, HARTZ™ variety H6200, HARTZ™ variety H6104, 
HARTZ™ variety H6255, HARTZ™ variety H6586, HARTZ™ variety H6191, HARTZ™ 
variety H7440, HARTZ™ variety H4452 Roundup Ready™, HARTZ™ variety H4994 Roundup 
Ready™, HARTZ™ variety H4988 Roundup Ready™, HARTZ™ variety H5000 Roundup 
Ready™, HARTZ™ variety H5147 Roundup Ready™, HARTZ™ variety H5247 Roundup 
Ready™, HARTZ™ variety H5350 Roundup Ready™, HARTZ™ variety H5545 Roundup 
Ready™, HARTZ™ variety H5855 Roundup Ready™, HARTZ™ variety H5088 Roundup 
Ready™, HARTZ™ variety H5164 Roundup Ready™, HARTZ™ variety H5361 Roundup 
Ready™, HARTZ™ variety H5566 Roundup Ready™, HARTZ™ variety H5181 Roundup 
Ready™, HARTZ™ variety H5889 Roundup Ready™, HARTZ™ variety H5999 Roundup 
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Ready™, HARTZ™ variety H6013 Roundup Ready™, HARTZ^m variety H6255 Roundup 
Ready™, HARTZ™ variety H6454 Roundup Ready™, HARTZ™ variety H6686 Roundup 
Ready™, HARTZ™ variety H7152 Roundup Ready™, HARTZ™ variety H7550 Roundup 
Ready™, HARTZ™ variety H8001 Roundup Ready™ (HARTZ SEED, Stuttgart, Arkansas, 
5 U.S.A.); A0868, AG0901, A1553, A1900, AG1901, A1923, A2069, AG2101, AG2201, A2247, 
AG2301, A2304, A2396, AG2401, AG2501, A2506, A2553, AG2701, AG2702, A2704, A2833, 
A2869, AG2901, AG2902, AG3001, AG3002, A3204, A3237, A3244, AG3301, AG3302, 
A3404, A3469, AG3502, A3559, AG3601, AG3701, AG3704, AG3750, A3834, AG3901, 
A3904, A4045 AG4301, A4341, AG4401, AG4501, AG4601, AG4602, A4604, AG4702, 
10 AG4901, A4922, AG5401, A5547, AG5602, A5704, AG5801, AG5901, A5944, A5959, 

s 'i 

S AG6101, QR4459 and QP4544 (Asgrow Seeds, Des Moines, Iowa, U,S,AO; DeKalb variety 
111 CX445 (DeKalb, Illinois). An elite plant is any plant from an elite line, 
5 B) Rhg4 

yj The present invention provides a method for the production of a soybean plant having an 

^ Rhg4 SCN resistant allele comprising: (A) crossing a first soybean plant having an Rhg4 SCN 
hi resistant allele with a second soybean plant having an Rhg4 SCN sensitive allele to produce a 
%1 segregating population; (B) screening the segregating population for a member having an Rhg4 
^ SCN resistant allele with a first nucleic acid molecule capable of specifically hybridizing to 
linkage group A2, wherein the first nucleic acid molecule specifically hybridizes to a second 
20 nucleic acid molecule linked to the Rhg4 SCN resistant allele; and, (C) selecting the member for 
further crossing and selection. 

Rhg4 is located on linkage group A2 (Matson and Williams, Crop ScL 5:447 (1965)). 
SCN resistant alleles of Rhg4 provide partial resistance to SCN races 1 and 3 (U.S. Patent 
5,491,081). Together, rhgl and Rhg4 provide complete or nearly complete resistance to SCN 
25 race 3. The dominant gene, Rhg4, was found to be closely linked to the seed coat color locus (z ) 
(Matson and Williams, Crop ScL 5:447 (1965)). The / locus in Peking was also reported to be 
linked with a recessive gene for resistance to SCN (Sugiyama and Katsumi, Jpn, J. Breed. 16:S3- 
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86 (1966)). It is possible that Rhg4 and the recessive gene linked to the i locus are one and the 
same, which would call into question the classification of Rhg4 as a dominant gene. 

Using bioinformatic approaches the Rhg4 coding region is predicted to contain 2 exons 
(coding coordinates 111805-113968 and 114684-115204 of SEQ ID NO: 4). Rhg4 encodes an 
5 894 amino acid polypeptide. Rhg4 codes for a Xa21-like receptor kinase (SEQ ID NOs: 1099 
and 1116-1119) (Song et al, Science 270, 1804-1806, (1995)). Rhg4 has an extracellular LRR 
domain {Rhg4, SEQ ID NO: 1099, residues 34-44), a transmembrane domain (Rhg4 SEQ ID 
NO: 1099, residues 449-471), and STK domain (Rhg4, SEQ ID NO: 1099, residues 531-830). In 
a preferred embodiment, the LRR domain has multiple LRR repeats. In a more preferred 
10 embodiment, the LRR domain has 12 LRR repeats. 

Ill To identify proteins similar to the Rhg4 candidate, database searches are performed using 

m the predicted peptide sequences. The Rhg4 candidate shows similarity to TMK (Y07748)(73.0% 
y similarity and 54.8% identity (CLUSTALW (defauU parameters))) and TMKl PRECURSOR 
t§ (70.6% similarity and 55.1% identity (CLUSTALW (default parameters))), which are rice and 
L?. Arabidopsis receptor kinases, respectively. The predicted LRR extracellular domain reveals 
W similarity to TMK (Y07748)(70.1% similarity and 46,6% identity (CLUSTALW (default 
^ parameters))), TMKl PRECURSOR (gl707642) (65.8% similarity and 48,8% identity 

(CLUSTALW (default parameters))), and F21J9.1 (g2213607) (65.5% similarity and 45.6% 
20 identity (CLUSTALW (default parameters))). 

Figure 2 is an alignment of the LRR domain of the Rhg4 gene. A consensus sequence is 
shown as the top row. Each row of amino acids represents an LRR domain. The boxed region 
indicates the putative p-tum/ p-sheet structural motif postulated to be involved in ligand binding 
(Jones and Jones, Adv, BoL Res. Incorp, Adv. Plant Path. 24;89-167 (1997)). The hydrophobic 
25 leucine residues are thought to project into the core of the protein while the flanking amino acids 
are thought to be solvent exposed where they may interact with the ligand (Kobe and 
Deisenhofer, Nature 374; 183-186 (1995)). An "x" represents an arbitrary amino acid while an 
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"a" represents a hydrophobic residue (leucine, isoleucine, methionine, valine, or phenylalanine). 
Amino acid substitutions between resistant and sensitive phenotypes are bordered by a double 
line. The amino acid substitution within the 35-57 region is a histidine/glutamine substitution, 
and the amino acid substitution within the 81-104 region is a leucine/phenylalanine substitution. 
5 As used herein, a naturally-occurring Rhg4 allele is any allele that encodes for a protein 

having an extracellular LRR domain, a transmembrane domain, and STK domain where the 
naturally occurring allele is present on linkage group A2 and where certain Rgh4 alleles, but not 
all Rgh4 alleles, are capable of providing or contributing to resistance or partial resistance to a 
race of SCN. It is understood that such an allele can, using, for example methods disclosed 
10 herein, be manipulated so that the nucleic acid molecule encoding the protein is no longer 

present on linkage group A2. It is also understood that such an allele can, using, for example 
iri methods disclosed herein, be manipulated so that the nucleic acid molecule sequence is altered, 
03 As used herein, an Rhg4 SCN resistant allele is any Rhg4 allele where that allele alone or 

Ijj in combination with other SCN resistant alleles present in the plant, such as an rhgl SCN 
M resistant allele, provides resistance to a race of SCN, and that resistance is due, at least in part, to 
^ the genetic contribution of the 7f/?^4 allele. 

%! Any soybean plant having an Rhg4 SCN resistant allele can be used in conjunction with 

^ the present invention. Soybeans with known Rhg4 SCN resistant alleles can be used. Such 
soybeans include, but are not limited to, PI548402 (Peking), PI437654 (Er-hej-jan), PI438489 

20 (Chiquita), PI507354 (Tokei 421), PI548655 (Forrest), PI548988 (Pickett), PI88788, PI404198 
(Sun Huan Do), PI404166 (Krasnoaarmejkaja), Hartwig, Manokin, Doles, Dyer, and Custer. In a 
preferred aspect, the soybean plant having an Rhg4 SCN resistant allele is an Rhg4 haplotype 3 
allele in a plant having either an rhgl haplotype 2 or rhgl haplotype 4 allele. Examples of 
soybeans with an Rhg4 haplotype 3 allele are PI548402 (Peking), PI88788, PI404198 (Sun huan 

25 do), PI438489 (Chiquita), PI437654 (Er-hej-jan), PI404166 (Krasnoaarmejkaja), PI548655 
(Forrest), PI548988 (Pickett), and PI507354 (Tokei 421). In addition, using the methods or 
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agents of the present invention, soybeans and wild relatives of soybeans such as Glycine soja cm 
be screened for the presence of Rhg4 SCN resistant alleles. 

Table 4 below is a table showing single nucleotide polymorphisms (SNPs) for three 
haplotype sequences of Rhg4, 



5 TABLE 4 





Identification 




Base number of contlg 318013_region__A3 


Markers 




Hap 


PI number 


Line 


Ph 


Coat 


111933 


112065 


112101 


112461 


114066 


scn279 


scnb267 


scn273 




1 


- 


A2069 


R 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 




1 


- 


A2869 


R 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 




1 


- 


A3244 


S 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 




1 


PI87631 


Kindaizu 


R 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 




1 


PI548389 


Minsoy 


S 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 


s 1 


1 


PI5 18664 


Hutcheson 


S 


yellow 


T 


A 


T 


A 


T 


2 


2 


2 




1 


PI548658 


Lee 74 


s 


yellow 


T 


A 


T 


A 


T 


- 


2 


2 




2 


PI540556 


Jack 


R 


yellow 


G 


A 


T 


A 


T 


2 


2 


1 


'i 


2 


PI360843 


Oshimashirome 


R 


yellow 


G 


A 


T 


A 


T 










2 


PI423871 


Toyosuzu 


R 


yellow 


G 


A 


T 


A 


T 










3 


PI548402 


Peking 


R 


black 


G 


C 


C 


T 


G 










3 


PI88788 




R 


black 


G 


C 


C 


T 


G 










3 


PI404198 B 


(Sun huan do) 


R 


black 


G 


c 


C 


T 


G 










3 


PI438489 B 


(CMquita) 


R 


black 


G 


c 


C 


T 


G 










3 


Pr437654 


Er-hej-jan 


R 


black 


G 


c 


c 


T 


G 










3 


PI404166 


Krasnoaarmejkaja 


R 


black 


G 


c 


c 


T 


G 










3 


PI290136 


Noir 


S 


black 


G 


c 


c 


T 


G 










3 


PI548655 


Forrest 


R 


yellow 


G 


c 


c 


T 


G 










3 


PI548988 


Pickett 


R 


yellow 


G 


c 


c 


T 


G 










3 


PI507354 


Tokei 421 


R 


yellow 


G 


c 


c 


T 


G 










N/A 


PI467312 


Cha-mo-shi-dou 


R 


GnBr 


G 


c 


c 


T 












N/A 


PI209332 


No,4 


R 


black 


T 


A 


T 






2 


2 


2 




N/A 


PI518672 


Will 


s 


yellow 


T 


A 


T 




T 


2 


2 


2 




N/A 


PI548667 


Essex 


S 


yellow 


T 


A 


T 




T 


2 


2 


2 



In Table 4, discrete haplotypes are designated 1 through 3. N/A refers to a haplotype that 
is not characterized. In Table 4, the Plant Introduction classification number is indicated in the 
"PI#" column. A dash indicates that no PI number is known or assigned for the line under 
investigation. The line from which the sequences are derived is indicated in the "line" column, 
10 with a dash indicating an unknown or unnamed line. The "Ph." column of Table 4 indicates 

whether a given line has been reported to be resistant (R) to at least one race of SCN, or sensitive 
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(S). The "coat" column shows the phenotypic coat color of a seed as either yellow, black, 
green/brown (GnBr), or unknown/unassigned (dash). At the / locus, black seeded varieties 
harbor the / allele for black or imperfect black seed coat. In a preferred embodiment, the seed 
has a yellow coat. 

The nucleotide base located at each of 5 positions in each of the haplotype sequences is 
shown in the columns labeled "Base number of contig 318013_region_A3." The base number at 
the top of each column correspond to the base number in the contig 318013_region_A3 of 
reference line A3244 (SEQ ID NO: 4). The letters G, A, C, and T correspond to the bases 
guanine, adenine, cytosine, and thymine, A dash indicates that the identity of the base is 
unknown. 

Three different simple sequence repeat (SSR) or microsatelUte markers that occur within 
the sequences, scn279 (SEQ ID NO: 292), scn267 (SEQ ID NO: 282), and scn273 (SEQ ID NO: 
294), are listed under "markers." The allele of each marker occurring in a haplotype is indicated 
by a 1 or a 2, with a dash indicating that the information is not determined. 

Any soybean plant having an Rhg4 SCN sensitive allele can be used in conjunction with 
the present invention. Such soybeans include A3244, Will, Noir, Lee 74, Essex, Minsoy, A2704, 
A2833, AG3001, Williams, DK23-51, and Hutcheson. In a preferred aspect, the soybean plant 
having an Rhg4 SCN sensitive allele is an Rhg4 A3244 allele. In addition, using the methods or 
agents of the present invention, soybeans and wild relative of soybean such as Glycine soja can 
be screened for the presence of Rhg4 SCN sensitive alleles. 

In a preferred aspect, the source of either an Rhg4 SCN sensitive allele or an Rhg4 SCN 
resistant allele, or more preferably both, is an elite plant. 

In table 5, below, rhgl and Rhg4 haplotypes for various cultivars are compared. 
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TABLES 



Identification 


Hapiotype 


Pl# 


Line 


Coat 


Ph. 


rhg^ 


r/ipl 


_ 


A3244 


yellow 


S 


1 


1 


PI548402 


Peking 


black 


R 


3 


2 


P1404198B 


(Sun huan do) 


black 


R 


3 


2 


PI438489 B 


(Chiquita) 


black 


R 


3 


2 


P1437654 


Er-hej-jan 


black 


R 


3 


2 


PI404166 


Krasnoaarmejkaja 


black 


R 


3 


2 


PI548655 


Forrest 


yellow 


R 


3 


2 


PI548988 


Pickett 


yellow 


R 


3 


2 


PI507354 


Tokei 421 


yellow 


R 


3 


2 


PI88788 




black 


R 


3 


4 


PI467312 


Clia-mo-shi-ciou 


GnBr 


R 


N/A 


4 




Noir 


black 


S 


3 


6 


_ 


Jack 


yellow 


R 


2 


N/A 


PI360843 


Oshimashirome 


yellow 


R 


2 


N/A 


PI423871 


Toyosuzu 


yellow 


R 


2 


3 


PI209332 


No.4 


black 


R 


N/A 


N/A 


P187631 


Kindaizu 


yellow 


R 


1 


_ 


_ 


Minsoy 


yellow 


S 


1 


N/A 




Will 


yellow 


S 


N/A 


4 




Hutcheson 


yellow 


S 


1 


6 




Lee 74 


yellow 


S 


N/A 


7 




Essex 


yellow 


S 


N/A 


N/A 




A2069 


yellow 


R 


1 


N/A 




A2869 


yellow 


R 


1 


N/A 



In Table 5, haplotypes, as used in Tables 2 through 4, are listed for each line. N/A refers 
to a hapiotype that is not characterized. The Plant Introduction classification number is indicated 
in the "PI#'' column. A dash indicates that no PI number is known or assigned for the line under 
investigation. The line from which the sequences are derived is indicated in the "line" column, 
with a dash indicating an unknown or unnamed line. The "Ph." column of table 5 indicates 
whether a given line has been reported to be resistant (R) to at least one race of SCN, or sensitive 
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(S). The "coat" column shows the phenotypic coat color of a seed as either yellow, black, 
green/brown (GnBr), or unknown/unassigned (dash). At the / locus, black seeded varieties 
harbor the i allele for black or imperfect black seed coat. In a preferred embodiment, the seed 
has a yellow coat. 
5 Screening for rhgl and Rhg4 alleles 

Any appropriate method can be used to screen for a plant having an rhgl SCN resistant 
allele. Any appropriate method can be used to screen for a plant having an Rhg4 SCN resistant 
allele. In a preferred aspect of the present invention, a nucleic acid marker of the present 
invention can be used (see section entitled "Screening for rhgl and Rhg4 alleles" and subsection 
10 (ii) of the section entitled "Agents"). 

C Additional markers, such as SSRs, AFLP markers, RFLP markers, RAPD markers, 

111 phenotypic markers, SNPs, isozyme markers, microarray transcription profiles that are 
CS genetically linked to or correlated with alleles of a QTL of the present invention can be utilized 
y (Walton, Seed World 22-29 (July, 1993); Burow and Blake, Molecular Dissection of Complex 
^ Traits, 13-29, Eds. Paterson, CRC Press, New York (1988)), Methods to isolate such markers 
Q are known in the art. For example, locus-specific SSRs can be obtained by screening a genomic 
34 library for SSRs, sequencing of "positive" clones, designing primers which flank the repeats, and 

amplifying genomic DNA with these primers. The size of the resulting amplification products 

can vary by integral numbers of the basic repeat unit. To detect a polymorphism, PCR products 
20 can be radiolabeled, separated on denaturing polyacrylamide gels, and detected by 

autoradiography. Fragments with size differences >4 bp can also be resolved on agarose gels, 

thus avoiding radioactivity. 

Other SSR markers may be utilized. Amplification of simple tandem repeats, mainly of 

the [CA]n type were reported by Litt and Luty, Amen /. Human Genet. 44:391 -AOl (1989); 
25 Smeets et aU Human Genet 83:245-251 (1989); Tautz, Nucleic Acids Res, 77:6463-6472 

(1989) ; Weber and May, Am. /. Hum. Genet. 44.-388-396 (1989). Weber, Genomics 7:524-530 

(1990) , reported that the level of polymorphism detected by PCR-amplified [CA]n type SSRs 
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depends on the number of the "perfect" (ie., uninterrupted), tandemly repeated motifs. Below a 
certain threshold 12 CA-repeats), the SSRs were reported to be primarily monomorphic. 
Above this threshold, however, the probability of polymorphism increases with SSR length. 
Consequently, long, perfect arrays of SSRs are preferred for the generation of markers, for 
5 the design and synthesis of flanking primers. 

Suitable primers can be deduced from DNA databases {e.g., Akkaya et at, Genetics. 
732:1131-1139 (1992)). Alternatively, size-selected genomic libraries (200 to 500 bp) can be 
constructed by, for example, using the following steps: (1) isolation of genomic DNA; (2) 
digestion with one or more 4 base-specific restriction enzymes; (3) size-selection of restriction 
10 fragments by agarose gel electrophoresis, excision and purification of the desire size fraction; (4) 
41 ligation of the DNA into a suitable vector and transformation into a suitable E. coli strain; (5) 
in screening for the presence of SSRs by colony or plaque hybridization with a labeled probe; (6) 
Ql isolation of positive clones and sequencing of the inserts; and (7) design of suitable primers 
y flanking the SSR, 

|4 Establishing libraries with small, size-selected inserts can be advantageous for SSR 

:^ isolation for two reasons: (1) long SSRs are often unstable in E, coli, and (2) positive clones can 

^ be sequenced without subcloning. A number of approaches have been reported for the 

^ enrichment of SSRs in genomic libraries. Such enrichment procedures are particularly useful if 

libraries are screened with comparatively rare tri- and tetranucleotide repeat motifs. One such 
20 approach has been described by Ostrander et al, Proc, Natl Acad, Set (U.S. A). S9: 34 19-3423 
(1992), who reported the generation of a small-insert phagemid Ubrary in an E. coli strain 
deficient in UTPase (dSt) and uracil-N-glycosylase (ung) genes. In the absence of UTPase and 
uracil-N-glycosylase, dUTP can compete with dTTP for the incorporation into DNA. Single- 
stranded phagemid DNA isolated from such a Ubrary can be primed with [CAln and [TG]n 
25 primers for second strand synthesis, and the products used to transform a wild-type E. coli strain. 
Since under these conditions there will be selection against single-stranded, uracil-containing 
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DNA molecules, the resulting library will consist of primer-extended, double-stranded products 

and an about 50-fold enrichment in CA-repeats. 

Other reported enrichment strategies rely on hybridization selection of simple sequence 

repeats prior to cloning (Karagyozov et al. Nucleic Acids Res. 27:3911-3912 (1993); Armour et 
5 aU Hum. Mol Gen. 5:599-605 (1994); Kijas et aU Genome 55:349-355 (1994); Kandpal et a/., 

Proc. Natl Acad. Set (U.S.A.) 97:88-92 (1994); Edwards et aU Am. J, Hum. Genet. 49:146-156 

(1991)). Hybridization selection, can for example, involve the following steps: (1) genomic 

DNA is fragmented, either by sonication, or by digestion with a restriction enzyme; (2) genomic 

DNA fragments are ligated to adapters that allow a *'whole genome PGR" at this or a later stage 
10 of the procedure; (3) genomic DNA fragments are amplified, denatured and hybridized with 
-m single-stranded SSR sequences bound to a nylon membrane; (4) after washing off unbound 
Ul DNA, hybridizing fragments enriched for SSRs are eluted from the membrane by boiling or 
m alkali treatment, reamplified using adapter-complementary primers, and digested with a 
ill restriction enzyme to remove the adapters; and (5) DNA fragments are ligated into a suitable 
l§ vector and transformed into a suitable E. coli strain. SSRs can be found in up to 50-70% of the 

clones obtained from these procedures (Armour et al, Hum. Mol Gen. 5:599-605 (1994); 
2 Edwards et al, Am. J. Hum. Genet. 49:146-156 (1991)). 

An alternative hybridization selection strategy was reported by Kijas et al, Genome 

55:599-605 (1994), which replaced the nylon membrane with biotinylated, SSR-complementary 
20 oligonucleotides attached to streptavidin-coated magnetic particles. SSR-containing DNA 

fragments are selectively bound to the magnetic beads, reamphfied, restriction-digested and 

cloned. 

It is further understood that other additional markers on linkage group G or A2 may be 
utihzed (Morgante et al, Genome 57:763-769 (1994)), As used herein, reference to the linkage 
25 group of G or A2 refers to the linkage group that corresponds to linkage groups U5 and U3, 
respectively from the genetic map of Glycine max (Mansur et al, Crop Scl 36: 1327-1336 
(1996), and linkage groups G and A2, respectively, of Glycine max x. Glycine soja (Shoemaker 
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et aU Genetics 144: 329-336 (1996)) that is present in Glycine soja (Soybase, an Agricultural 
Research Service, United States Department of Agriculture (http://129.186.26.940/ and USDA - 
Agricultural Research Service: http://www.ars.usda.gov/)). 

PCR-amplified SSRs can be used, because they are locus-specific, codominant, occur in 
5 large numbers and allow the unambiguous identification of alleles. Standard PCR-amplified 
SSR protocols use radioisotopes and denaturing polyacrylamide gels to detect amplified SSRs. 
In many situations, however, allele sizes are sufficiently different to be resolved on high 
percentage agarose gels in combination with ethidium bromide staining (Bell and Ecker, 
Genomics 79:137-144 (1994); Becker and Heun, Genome 38:991-998 (1995); Huttel, Ph.D. 
10 Thesis, University of Frankfurt, Germany (1996)). High resolution without applying 
e radioactivity is also provided by nondenaturing polyacrylamide gels in combination with either 
111 ethidium bromide (Scrimshaw, Biotechniques 75:2189 (1992)) or silver straining (Klinkicht and 
m Tautz, Molecular Ecology L 133-134 (1992); Neilan et al, Biotechniques 77:708-712 (1994)). 

An alternative of PCR-amplified SSRs typing involves the use of fluorescent primers in 
|| combination with a semi-automated DNA sequencer (Schwengel et ah, Genomics 22:46-54 

(1994)). Fluorescent PGR products can be detected by real-time laser scanning during gel 
Zl electrophoresis. An advantage of this technology is that different amplification reactions as well 
^ as a size marker (each labeled with a different fluorophore) can be combined into one lane during 

electrophoresis. Multiplex analysis of up to 24 different SSR loci per lane has been reported 
20 (Schwengel et al, Genomics 22:46-54 (1994)). 

The detection of polymorphic sites in a sample of DNA may be facilitated through the 
use of nucleic acid amplification methods. Such methods specifically increase the concentration 
of polynucleotides that span the polymorphic site, or include that site and sequences located 
either distal or proximal to it. Such amplified molecules can be readily detected by gel 
25 electrophoresis or other means. 

The most preferred method of achieving such amplification employs the polymerase 
chain reaction ("PGR") (MuUis etal. Cold Spring Harbor Symp, Quant Biol 57:263-273 
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(1986); Erlich et al, European Patent Appln. 50,424; European Patent Appln, 84,796, European 
Patent Application 258,017, European Patent Appln. 237,362; MuUis, European Patent Appln. 
201,184; MuUis et al, U.S. Patent No. 4,683,202; Erlich, U.S. Patent No. 4,582,788; and Saiki et 
al, U.S. Patent No. 4,683,194), using primer pairs that are capable of hybridizing to the proximal 
5 sequences that define a polymorphism in its double-stranded form. 

In lieu of PGR, alternative methods, such as the "Ligase Chain Reaction" ("LCR") may 
be used (Barany, Proc. Natl Acad, Sci (U,SA,) SS: 189-193 (1991)). LCR uses two pairs of 
oligonucleotide probes to exponentially amplify a specific target. The sequences of each pair of 
oligonucleotides is selected to permit the pair to hybridize to abutting sequences of the same 

10 strand of the target. Such hybridization forms a substrate for a template-dependent ligase. As 
^ with PCR, the resulting products thus serve as a template in subsequent cycles and an 

01 exponential amplification of the desired sequence is obtained. 

011 LCR can be performed with oligonucleotides having the proximal and distal sequences of 
Uj the same strand of a polymorphic site. In one embodiment, either oligonucleotide will be 

H designed to include the actual polymorphic site of the polymorphism. In such an embodiment, 
p the reaction conditions are selected such that the oligonucleotides can be Hgated together only if 
^ the target molecule either contains or lacks the specific nucleotide that is complementary to the 
^'^ polymorphic site present on the ohgonucleotide. Alternatively, the oligonucleotides may be 

selected such that they do not include the polymorphic site (see, Segev, PCT Application WO 
20 90/01069). 

The "Oligonucleotide Ligation Assay" ("OLA") may alternatively be employed 
(Landegren et at. Science 247:1077-1080 (1988)). The OLA protocol uses two oligonucleotides 
that are designed to be capable of hybridizing to abutting sequences of a single strand of a target. 
OLA, like LCR, is particularly suited for the detection of point mutations. Unlike LCR, 
25 however, OLA results in "linear" rather than exponential amplification of the target sequence. 

Nickerson et al have described a nucleic acid detection assay that combines attributes of 
PCR and OLA (Nickerson et al, Proa Natl Acad, Sci, (U,S,A) 57:8923-8927 (1990)). In this 
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method, PGR is used to achieve the exponential amplification of target DNA, which is then 
detected using OLA. In addition to requiring multiple, and separate, processing steps, one 
problem associated with such combinations is that they inherit all of the problems associated 
with PGR and OLA. 

5 Schemes based on ligation of two (or more) oligonucleotides in the presence of a nucleic 

acid having the sequence of the resulting "di-oligonucleotide," thereby amplifying the di- 
oligonucleotide, are also known (Wu et al. Genomics 4:560-569 (1989)), and may be readily 
adapted to the purposes of the present invention. 

Other known nucleic acid amplification procedures, such as allele-specific oligomers, 
10 branched DNA technology, transcription-based amplification systems, or isothermal 

amplification methods may also be used to amplify and analyze such polymorphisms (Malek et 
111 ah, U.S. Patent 5,130,238; Davey et al, European Patent AppUcation 329,822; Schuster et al, 
m U.S. Patent 5,169,766; Miller et al, PGT Patent Application WO 89/06700; Kwoh, et aL, Proa 
y Natl Acad ScL (U.S.A.) 86:1173-1177 (1989); Gingeras et al, PGT Patent Application WO 
|| 88/10315; Walker et al, Proc, Natl Acad, Sci (USA,) SP:392-396 (1992)). 

Polymorphisms can also be identified by Single Strand Conformation Polymorphism 
2=: (SSCP) analysis, SSCP is a method capable of identifying most sequence variations in a single 
strand of DNA, typically between 150 and 250 nucleotides in length (EUes, Methods in 
Molecular Medicine: Molecular Diagnosis of Genetic Diseases, Humana Press (1996); Orita et 
20 al, Genomics 5: 874-879 (1989)). Under denaturing conditions a single strand of DNA will 
adopt a conformation that is uniquely dependent on its sequence conformation. This 
conformation usually will be different, even if only a single base is changed. Most 
conformations have been reported to alter the physical configuration or size sufficiently to be 
detectable by electrophoresis. A number of protocols have been described for SSGP including, 
25 but not Kmited to, Lee et al, Anal Biochem. 205: 289-293 (1992); Suzuki et al, Anal Biochem, 
192: 82-84 (1991); Lo et al, Nucleic Acids Research 20: 1005-1009 (1992); Sarkar et aU 
Genomics 75:441-443 (1992). It is understood that one or more of the nucleic acids of the 
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present invention can be utilized as markers or probes to detect polymorphisms by SSCP 
analysis. 

Polymorphisms may also be found using random amplified polymorphic DNA (RAPD) 
(Williams et al, Nucl Acids Res. 18: 6531-6535 (1990)) and cleaveable amplified polymorphic 
5 sequences (CAPS) (Lyamichev et aL, Science 260: 778-783 (1993)). It is understood that one or 
more of the nucleic acid molecules of the present invention can be utilized as markers or probes 
to detect polymorphisms by RAPD or CAPS analysis. 

The identification of a polymorphism can be determined in a variety of ways. By 
correlating the presence or absence of it in a plant with the presence or absence of a phenotype, it 
10 is possible to predict the phenotype of that plant. If a polymorphism creates or destroys a 
\0 restriction endonuclease cleavage site, or if it results in the loss or insertion of DNA (e.g., a 
Ul variable nucleotide tandem repeat (VNTR) polymorphism), it will alter the size or profile of the 
m DNA fragments that are generated by digestion with that restriction endonuclease. As such, 
jji individuals that possess a variant sequence can be distinguished from those having the original 
14 sequence by restriction fragment analysis. Polymorphisms that can be identified in this manner 
^ are termed "restriction fragment length polymorphisms" ("RFLPs"). RFLPs have been widely 
z! used in human and plant genetic analyses (Glassberg, UK Patent Application 2135774; Skolnick 
^ et al, Cytogen. Cell Genet, 32:58-67 (1982); Botstein et al,Anm /. Hum. Genet. 52:314-331 

(1980); Fischer et al (PCT Apphcation WO90/13668); Uhlen, PCX Application WO90/1 1369)). 
20 A central attribute of "single nucleotide polymorphisms," or "SNPs" is that the site of the 

polymorphism is at a single nucleotide. SNPs have certain reported advantages over RFLPs and 
VNTRs. First, SNPs are more stable than other classes of polymorphisms. Their spontaneous 
mutation rate is approximately 10"^ (Komberg, DNA Replication, W.H, Freeman & Co., San 
Francisco, 1980), approximately 1,000 times less frequent than VNTRs (U.S. Patent 5,679,524). 
25 Second, SNPs occur at greater frequency, and with greater uniformity than RFLPs and VNTRs. 
As SNPs result from sequence variation, new polymorphisms can be identified by sequencing 
random genomic or cDNA molecules. SNPs can also result from deletions, point mutations and 
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insertions. Any single base alteration, whatever the cause, can be an SNP. The greater 
frequency of SNPs means that they can be more readily identified than the other classes of 
polymorphisms. 

SNPs and insertion/deletions can be detected by methods, by any of a variety of methods 
5 including those disclosed in U.S. Patents 5,210,015; 5,876,930 and 6,030,787 in which an 
ohgonucleotide probe having reporter and quencher molecules is hybridized to a target 
polynucleotide. The probe is degraded by 5' -> 3' exonuclease activity of a nucleic acid 
polymerase. A useful assay is available from AB Biosystems (850 Lincoln Centre Drive, Foster 
City, CA) as the Taqman® assay. 

10 Specific nucleotide variations such as SNPs and insertion/deletions can also be detected 
m by labeled base extension methods as disclosed in U.S. Patents 6,004,744; 6,013,431; 5,595,890; 
Ul 5,762,876; and 5,945,283. These methods are based on primer extension and incorporation of 

detectable nucleoside triphosphates. The primer is designed to anneal to the sequence 
yj immediately adLjacent to the variable nucleotide which can be can be detected after incorporation 

11 of as few as one labeled nucleoside triphosphate. US Patent 5,468,613 discloses allele specific 

Ms 

pi oligonucleotide hybridizations where single or multiple nucleotide variations in nucleic acid 
sequence can be detected in nucleic acids by a process in which the sequence containing the 

^ nucleotide variation is amplified, spotted on a membrane and treated with a labeled sequence- 
specific oligonucleotide probe. 

20 Such methods also include the direct or indirect sequencing of the site, the use of 

restriction enzymes where the respective alleles of the site create or destroy a restriction site, the 
use of allele-specific hybridization probes, the use of antibodies that are specific for the proteins 
encoded by the different alleles of the polymorphism or by other biochemical interpretation. 
SNPs can be sequenced by a number of methods. Two basic methods may be used for DNA 

25 sequencing, the chain termination method of Sanger et at, Proc. Natl Acad. Sci (USA,) 74: 
5463-5467 (1977), and the chemical degradation method of Maxam and Gilbert, Proc, Nat. 
Acad Sci. (U.S.A.) 74: 560-564 (1977). Automation and advances in technology such as the 
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replacement of radioisotopes with fluorescence-based sequencing have reduced the effort 
required to sequence DNA (Craxton, Methods, 2: 20-26 (1991); Ju et al, Proa Natl Acad. Set 
(aSA.) 92: 4347-4351 (1995); Tabor and Richardson, Proa Natl Acad Set (U.S.A) 92: 6339- 
6343 (1995)). Automated sequencers are available from, for example, Pharmacia Biotech, Inc., 
5 Piscataway, New Jersey (Pharmacia ALF), LI-COR, Inc., Lincoln, Nebraska (LI-COR 4,000) 
and Millipore, Bedford, Massachusetts (Milhpore BaseStation), 

In addition, advances in capillary gel electrophoresis have also reduced the effort 
required to sequence DNA and such advances provide a rapid high resolution approach for 
sequencing DNA samples (Swerdlow and Gesteland, Nucleic Acids Res, iS:1415-1419 (1990); 

10 Smith, Nature 549:812-813 (1991); Luckey et al. Methods Enzymol 275:154-172 (1993); Lu et 
m al, J. Chromatog. A. 680:491-501 (1994); Carson et al.,Anal. Chem. 55:3219-3226 (1993); 

in Huang et al. Anal. Chem. 64:2149-2154 (1992); Kheterpal et al, Electrophoresis i7: 1852-1859 
rg (1996); Quesada and Zhang, Electrophoresis i7:1841-1851 (1996); Baba, Yakugaku Zasshi 

11 ii7:265-281 (1997), Marino,^/. Theor. Electrophor. 5:1-5 (1995)). 

If The genetic linkage of marker molecules can be established by a gene mapping model 

k such as, without limitation, the flanking marker model reported by Lander and Botstein, 

Genetics, 727:185-199 (1989), and the interval mapping, based on maximum likelihood methods 

t i 

described by Lander and Botstein, Genetics, 727:185-199 (1989), and implemented in the 

software package MAPMAKER/QTL (Lincoln and Lander, Mapping Genes Controlling 
20 Quantitative Traits Using MAPMAKER/QTL, Whitehead Institute for Biomedical Research, 

Massachusetts, (1990). Additional software includes Qgene, Version 2.23 (1996), Department of 

Plant Breeding and Biometry, 266 Emerson Hall, Cornell University, Ithaca, NY). Use of Qgene 

software is a particularly preferred approach. 

A maximum likelihood estimate (MLE) for the presence of a marker is calculated, 
25 together with an MLE assuming no QTL effect, to avoid false positives. A logio of an odds ratio 

(LOD) is then calculated as: LOD = logio (MLE for the presence of a QTL/MLE given no Unked 

QTL). 
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The LOD score essentially indicates how much more likely the data are to have arisen 
assuming the presence of a QTL than in its absence. The LOD threshold value for avoiding a 
false positive with a given confidence, say 95%, depends on the number of markers and the 
length of the genome. Graphs indicating LOD thresholds are set forth in Lander and Botstein, 
5 Genetics, i2i:185-199 (1989), and further described by Arus and Moreno-Gonzalez, Plant 

Breeding, Hayward, Bosemark, Romagosa (eds,) Chapman & Hall, London, pp. 314-331 (1993). 

Additional models can be used. Many modifications and alternative approaches to 
interval mapping have been reported, including the use of non-parametric methods (Kruglyak 
and Lander, Genetics, 7 59.* 142 1-1428 (1995)). Multiple regression methods or models can be 
10 also be used, in which the trait is regressed on a large number of markers (Jansen, Biometrics in 
O Plant Breed, van Oijen, Jansen (eds.) Proceedings of the Ninth Meeting of the Eucarpia Section 
m Biometrics in Plant Breeding, The Netherlands, pp. 116-124 (1994); Weber and Wricke, 

Advances in Plant Breeding, Blackwell, Berlin, 16 (1994)). Procedures combining interval 
hj mapping with regression analysis, whereby the phenotype is regressed onto a single putative 
%4 QTL at a given marker interval, and at the same time onto a number of markers that serve as 

'cofactors,' have been reported by Jansen and Stam, Genetics, 736:1447-1455 (1994) and Zeng, 
zl Genetics, 755:1457-1468 (1994). Generally, the use of cofactors reduces the bias and sampling 
^ error of the estimated QTL positions (Utz and Melchinger, Biometrics in Plant Breeding, van 
Oijen, Jansen (eds.) Proceedings of the Ninth Meeting of the Eucarpia Section Biometrics in 
20 Plant Breeding, The Netherlands, pp.195-204 (1994), thereby improving the precision and 
efficiency of QTL mapping (Zeng, Genetics, 755:1457-1468 (1994)). These models can be 
extended to multi-environment experiments to analyze genotype-environment interactions 
(Jansen et al, Theo, Appl Genet, 91:33-31 (1995)). 

Selection of an appropriate mapping or segregation populations is important to map 
25 construction. The choice of appropriate mapping population depends on the type of marker 
systems employed (Tanksley et al, Molecular mapping plant chromosomes. Chromosome 
structure and function: Impact of new concepts J.P. Gustafson and R. Appels (eds.), Plenum 
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Press, New York, pp. 157-173 (1988)). Consideration must be given to the source of parents 
(adapted vs. exotic) used in the mapping population. Chromosome pairing and recombination 
rates can be severely disturbed (suppressed) in wide crosses (adapted x exotic) and generally 
yield greatly reduced linkage distances. Wide crosses will usually provide segregating 
5 populations with a relatively large array of polymorphisms when compared to progeny in a 
narrow cross (adapted x adapted). 

As used herein, the progeny include not only, without limitation, the products of any 
cross (be it a backcross or otherwise) between two plants, but all progeny whose pedigree traces 
back to the original cross. Specifically, without limitation, such progeny include plants that have 
10 12.5% or less genetic material derived from one of the two originally crossed plants. As used 
a herein, a second plant is derived from a first plant if the second plant's pedigree includes the first 
yl plant. 

§3 An F2 population is the first generation of selfing after the hybrid seed is produced. 

yi Usually a single Fi plant is selfed to generate a population segregating for all the genes in 

M Mendelian (1:2: 1) fashion. Maximum genetic information is obtained from a completely 

classified F2 population using a codominant marker system (Mather, Measurement of Linkage in 

^ Heredity: Methuen and Co., (1938)). In the case of dominant markers, progeny tests {e,g,, F3, 

^ BCF2) are required to identify the heterozygotes, thus making it equivalent to a completely 
classified F2 population. However, this procedure is often prohibitive because of the cost and 

20 time involved in progeny testing. Progeny testing of F2 individuals is often used in map 

construction where phenotypes do not consistently reflect genotype (e,g,, disease resistance) or 
where trait expression is controlled by a QTL. Segregation data from progeny test populations 
(e.g., F3 or BCF2) can be used in map construction. Marker-assisted selection can then be 
applied to cross progeny based on marker-trait map associations (F2, F3), where linkage groups 

25 have not been completely disassociated by recombination events (i.e., maximum disequilibrium). 

Recombinant inbred lines (RIL) (genetically related lines; usually >F5, developed from 
continuously selfing F2 lines towards homozygosity) can be used as a mapping population. 
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Information obtained from dominant markers can be maximized by using RIL because all loci 
are homozygous or nearly so. Under conditions of tight linkage about <10% 
recombination), dominant and co-dominant markers evaluated in RIL populations provide more 
information per individual than either marker type in backcross populations (Reiter et aL, Proc, 
5 Natl Acad, ScL (US.A) S9: 1477-1481 (1992)). However, as the distance between markers 
becomes larger (ie,, loci become more independent), the information in RIL populations 
decreases dramatically when compared to codominant markers. 

Backcross populations (e.g., generated from a cross between a successful variety 
(recurrent parent) and another variety (donor parent) carrying a trait not present in the former) 
10 can be utilized as a mapping population. A series of backcrosses to the recurrent parent can be 
J3 made to recover most of its desirable traits. Thus a population is created consisting of 
in individuals nearly like the recurrent parent but each individual carries varying amounts or mosaic 
B of genomic regions from the donor parent, Backcross populations can be useful for mapping 
hj dominant markers if all loci in the recurrent parent are homozygous and the donor and recurrent 
|| parent have contrasting polymorphic marker alleles (Reiter et al, Proc, Natl Acad, Scl (U.S,A) 
;U 89:1477-1481 (1992)). Information obtained from backcross populations using either 

codominant or dominant markers is less than that obtained from Fa populations because one, 
rather than two, recombinant gametes are sampled per plant. Backcross populations, however, 
are more informative (at low marker saturation) when compared to RILs as the distance between 
20 hnked loci increases in RIL populations {i,e,y about .15% recombination). Increased 

recombination can be beneficial for resolution of tight linkages, but may be undesirable in the 
construction of maps with low marker saturation. 

Near-isogenic Hues (NIL) created by many backcrosses to produce an array of individuals 
that are nearly identical in genetic composition except for the trait or genomic region under 
25 interrogation can be used as a mapping population. In mapping with NILs, only a portion of the 
polymorphic loci are expected to map to a selected region. 
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Bulk segregant analysis (BSA) is a method developed for the rapid identification of 
linkage between markers and traits of interest (Michelmore, et al, Proc. Natl Acad ScL (USA,) 
SS:9828-9832 (1991)). In BSA, two bulked DNA samples are drawn from a segregating 
population originating from a single cross. These bulks contain individuals that are identical for 
5 a particular trait (resistant or sensitive to particular disease) or genomic region but arbitrary at 
unlinked regions {le,, heterozygous). Regions unlinked to the target region will not differ 
between the bulked samples of many individuals in BSA. 

Plants generated using a method of the present invention can be part of or generated from 
a breeding program. The choice of breeding method depends on the mode of plant reproduction, 
10 the heritabihty of the trait(s) being improved, and the type of cultivar used commercially (e.g., Fi 
-11 hybrid cultivar, pureline cultivar, etc). Selected, non-limiting approaches, for breeding the plants 
U1 of the present invention are set forth below. A breeding program can be enhanced using marker 
o| assisted selection of the progeny of any cross. It is further understood that any commercial and 
h| non-commercial cultivars can be utilized in a breeding program. Factors such as, for example, 
|| emergence vigor, vegetative vigor, stress tolerance, disease resistance, branching, flowering, 
seed set, seed size, seed density, standability, and threshability etc, will generally dictate the 
choice. 

^ For highly heritable traits, a choice of superior individual plants evaluated at a single 

location will be effective, whereas for traits with low heritability, selection should be based on 

20 mean values obtained from replicated evaluations of families of related plants. Popular selection 
methods conmionly include pedigree selection, modified pedigree selection, mass selection, and 
recurrent selection. In a preferred embodiment a backcross or recurrent breeding program is 
undertaken. 

The complexity of inheritance influences choice of the breeding method. Backcross 
25 breeding can be used to transfer one or a few favorable genes for a highly heritable trait into a 
desirable cultivar. This approach has been used extensively for breeding disease-resistant 
cultivars. Various recurrent selection techniques are used to improve quantitatively inherited 
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traits controlled by numerous genes. The use of recurrent selection in self-pollinating crops 
depends on the ease of pollination, the frequency of successful hybrids from each pollination, 
and the number of hybrid offspring from each successful cross. 

Breeding lines can be tested and compared to appropriate standards in environments 
representative of the commercial target area(s) for two or more generations. The best lines are 
candidates for new commercial cultivars; those still deficient in traits may be used as parents to 
produce new populations for further selection. 

One method of identifying a superior plant is to observe its performance relative to other 
experimental plants and to a widely grown standard cultivar. If a single observation is 
inconclusive, replicated observations can provide a better estimate of its genetic worth. A 
breeder can select and cross two or more parental lines, followed by repeated selfing and 
selection, producing many new genetic combinations. 

The development of new soybean cultivars requires the development and selection of 
soybean varieties, the crossing of these varieties and selection of superior hybrid crosses. The 
hybrid seed can be produced by manual crosses between selected male-fertile parents or by using 
male steriHty systems. Hybrids are selected for certain single gene traits such as pod color, 
flower color, seed yield, pubescence color or herbicide resistance which indicate that the seed is 
truly a hybrid. Additional data on parental lines, as well as the phenotype of the hybrid, 
influence the breeder's decision whether to continue with the specific hybrid cross. 

Pedigree breeding and recurrent selection breeding methods can be used to develop 
cultivars from breeding populations. Breeding programs combine desirable traits from two or 
more cultivars or various broad-based sources into breeding pools from which cultivars are 
developed by selfing and selection of desired phenotypes. New cultivars can be evaluated to 
determine which have commercial potential. 

Pedigree breeding is used commonly for the improvement of self -pollinating crops. Two 
parents who possess favorable, complementary traits are crossed to produce an Fi. An F2 
population is produced by selfing one or several F/s, Selection of the best individuals in the best 
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families is performed. Replicated testing of families can begin in the F4 generation to improve 
the effectiveness of selection for traits with low heritability. At an advanced stage of inbreeding 
(Le,, ¥e and F7), the best lines or mixtures of phenotypically similar lines are tested for potential 
release as new cultivars. 
5 Backcross breeding has been used to transfer genes for a simply inherited, highly 

heritable trait into a desirable homozygous cultivar or inbred line, which is the recurrent parent. 
The source of the trait to be transferred is called the donor parent. The resulting plant is 
expected to have the attributes of the recurrent parent (e.g,, cultivar) and the desirable trait 
transferred from the donor parent. After the initial cross, individuals possessing the phenotype of 
10 the donor parent are selected and repeatedly crossed (backcrossed) to the recurrent parent. The 
# resulting parent is expected to have the attributes of the recurrent parent (e.g. , cultivar) and the 
U1 desirable trait transferred from the donor parent. 

IP The single-seed descent procedure in the strict sense refers to planting a segregating 

lii population, harvesting a sample of one seed per plant, and using the one-seed sample to plant the 
If next generation. When the population has been advanced from the F2 to the desired level of 
U inbreeding, the plants from which lines are derived will each trace to different F2 individuals, 

■=!S;' 

The number of plants in a population declines each generation due to failure of some seeds to 
germinate or some plants to produce at least one seed. As a result, not all of the F2 plants 
originally sampled in the population will be represented by a progeny when generation advance 
20 is completed. 

In a multiple-seed procedure, soybean breeders conmionly harvest one or more pods from 
each plant in a population and thresh them together to form a bulk. Part of the bulk is used to 
plant the next generation and part is put in reserve. The procedure has been referred to as 
modified single-seed descent or the pod-bulk technique. 
25 The multiple-seed procedure has been used to save labor at harvest. It is considerably 

faster to thresh pods with a machine than to remove one seed from each by hand for the single- 
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seed procedure. The multiple-seed procedure also makes it possible to plant the same number of 
seeds of a population each generation of inbreeding. 

Descriptions of other breeding methods that are commonly used for different traits and 
crops can be found in one of several reference books (e,g,, Fehr, Principles ofCultivar 
5 Development Vol. 1, pp. 2-3 (1987)). 

In a preferred aspect of the present invention the source of the rhg\ SCN resistant allele 
for use in a breeding program is derived from a plant selected from the group consisting of 
PI548402 (Peking), PI200499, A2869, Jack, A2069, PI209332 (No:4), PI404166 
(Krasnoaarmejkaja), PI404198 (Sun huan do), PI437654 (Er-hej-jan), PI438489 (Chiquita), 
10 PI507354 (Tokei 421), PI548655 (Forrest), PI548988 (Pickett), PI84751, PI437654, PI40792, 

1 Pyramid, Nathan, AG2201, A3469, AG3901, A3904, AG4301, AG4401, AG4501, AG4601, 

lj\ PION9492, PI88788, Dyer, Custer, Manokin, Doles, and SCN resistant progeny thereof (USDA, 
m Soybean Germplasm Collection, University of Illinois, Illinois). In a more preferred aspect, the 
y source of the rhgl SCN resistant allele for use in a breeding program is derived from a plant 
If selected from the group consisting of PI548402 (Peking), PI404166 (Krasnoaarmejkaja), 
g PI404198 (Sun huan do), PI437654 (Er-hej-jan), PI438489 (Chiquita), PI507354 (Tokei 421), 

2 PI548655 (Forrest), PI548988 (Pickett), PI8475 1 , PI437654, PI40792, and SCN resistant 
^ progeny thereof. 

In a preferred aspect of the present invention the source of the rhgl SCN sensitive allele 
20 for use in a breeding program is derived from a plant selected from the group consisting of 
A3244, A2833, AG3001, Williams, Will, A2704, Noir, DK23-51, Lee 74, Essex, Minsoy, 
A1923, Hutcheson, and SCN sensitive progeny thereof. In a more preferred aspect, the source of 
the rhgl SCN sensitive allele for use in a breeding program is derived from an A3244 plant, and 
SCN sensitive progeny thereof. 
25 In a preferred aspect of the present invention the source of the RhgA SCN resistant allele 

for use in a breeding program is derived from a plant selected from the group consisting of 
PI548402 (Peking), PI437654 (Er-hej-jan), PI438489 (Chiquita), PI507354 (Tokei 421), 
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PI548655 (Forrest), PI548988 (Pickett), PI88788, PI404198 (Sun Huan Do), PI404166 

(Krasnoaarmejkaja), Hartwig, Manokin, Doles, Dyer, Custer, and SCN resistant progeny thereof. 

In a more preferred aspect, the source of the Rhg4 SCN resistant allele for use in a breeding 

program is derived from a plant selected from the group consisting of PI548402 (Peking), 
5 PI88788, PI404198 (Sun huan do), PI438489 (Chiquita), PI437654 (Er-hej-jan), PI404166 

(Krasnoaarmejkaja), PI548655 (Forrest), PI548988 (Pickett), PI507354 (Tokei 421), and SCN 

resistant progeny thereof. 

In a preferred aspect of the present invention the source of the Rhg4 SCN sensitive allele 

for use in a breeding program is derived from a plant selected from the group consisting of 
10 A3244, Will, Noir, Lee 74, Essex, Minsoy, A2704, A2833, AG3001, Wilhams, DK23-51, and 

Hutcheson, and SCN sensitive progeny thereof. In a more preferred aspect, the source of the 
111 Rhg4 SCN sensitive allele for use in a breeding program is derived from an A3244 plant, and 
01 SCN sensitive progeny thei^of . 

yj As used herein linkage of a nucleic acid sequence with another nucleic acid sequence 

l| may be genetic or physical. In a preferred embodiment, a nucleic acid marker is genetically 

hnked to either rhgl or Rhg4, where the marker nucleic acid molecule exhibits a LOD score of 
zl greater than 2.0, as judged by interval mapping, for SCN resistance or partial resistance, 
"^"^ preferably where the marker nucleic acid molecule exhibits a LOD score of greater than 3.0, as 
judged by interval mapping, for SCN resistance or partial resistance, more preferably where the 
20 marker nucleic acid molecule exhibits a LOD score of greater than 3.5, as judged by interval 
mapping, for SCN resistance or partial resistance and even more preferably where the marker 
nucleic acid molecule exhibits a LOD score of about 4.0, as judged by interval mapping, for 
SCN resistance or partial resistance based on maximum Ukelihood methods described by Lander 
and Botstein, Genetics, 72i:185-199 (1989), and implemented in the software package 
25 MAPMAKER/QTL (default parameters)(Lincoln and Lander, Mapping Genes Controlling 
Quantitative Traits Using MAPMAKER/QTL, Whitehead Institute for Biomedical Research, 
Massachusetts, (1990)). 
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In another embodiment the nucleic acid molecule may be physically linked to either rhgl 
or Rhg4, In a preferred embodiment, the nucleic acid marker specifically hybridizes to a nucleic 
acid molecule having a sequence that is present on linkage group G within 500 kb or lOOkb, 
more preferably within 50kb, even more preferably within 25kb of an rhgl allele, where the rghl 
allele is preferably a sensitive allele, and more preferably a sensitive allele from A3244. In a 
preferred embodiment the nucleic acid marker is capable of specifically hybridizing to a nucleic 
acid molecule having a sequence that is present on Unkage group A2 within 500kb or lOOkb, 
more preferably within 50kb, even more preferably within 25kb of an Rhg4 allele, where the 
Rgh4 allele is preferably a sensitive allele, and more preferably a sensitive allele from A3244. 

The present invention provides a method of investigating an rhgl haplotype of a soybean 
plant comprising; (A) isolating nucleic acid molecules from the soybean plant; (B) determining 
the nucleic acid sequence of an rhgl allele or part thereof; and, (C) comparing the nucleic acid 
sequence of the rhgl allele or part thereof to a reference nucleic acid sequence. 

As used herein, the term "investigating" refers to any method capable of detecting a 
feature, such as a polymorphism or haplotype. Nucleic acid molecules only need to be isolated 
from a soybean plant to the degree of purity necessary for the task required or to a greater purity 
if desired, A person of ordinary skill in the art has available techniques to isolate nucleic acid 
molecules from plants to a sufficient purity, for example without limitation, to sequence the 
desired region of the nucleic acid molecule or to carry out a marker assay. 

The determination of an rhglor RhgA allele or part thereof may be carried out using any 
technique. Illustration of such techniques include techniques that provide the nucleic acid 
sequence for an rhgl or rhgA allele or part thereof include amplification of a desired allele or 
part thereof (see, for example, the Examples and SEQ ID NOs: 8-53). In a preferred 
embodiment, the nucleic acid sequence determined is that of an exon of an rhgl allele, more 
preferably exon 1 or exon 3 of an rhgl allele, or of an LRR domain. In another preferred 
embodiment, a single nucleotide is determined. In another preferred embodiment, the nucleic 
acid sequence determined is that of an LRR domain. 
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A comparison of a sequence with a reference sequence can be carried out with any 

appropriate sequence comparison method. 

As used herein, a reference sequence is any rhgl allele sequence or consensus sequence. 

A reference sequence may be a nucleic acid sequence or an amino acid sequence. In a preferred 
5 embodiment, the reference sequence is any SCN resistant rhgl allele sequence. In a further 

preferred embodiment, the rhgl reference sequence is selected from the group consisting of SEQ 

ID NOs: 2, 3, 5, 6, 8-23, 28-43, 1097, 1098, and 1100-1115. 

The present invention provides a method of investigating an Rhg4 haplotype of a soybean 

plant comprising: (A) isolating nucleic acid molecules from the soybean plant; (B) determining 
10 the nucleic acid sequence of an Rhg4 allele or part thereof; and (C) comparing the nucleic acid 

sequence of the Rhg4 allele or part thereof to a reference nucleic acid sequence. 
Ill As used herein, a reference sequence is any Rhg4 allele sequence or consensus sequence. 

m A reference sequence ma be a nucleic acid sequence or an amino acid sequence. In a preferred 
y embodiment, the reference sequence is any SCN resistant Rhg4 allele sequence. In a further 

preferred embodiment, the Rhg4 reference sequence is selected from the group consisting of 
g SEQ ID NOs: 4, 7, 44-47, 50-53, 1099, and 1 1 16-1 1 19. 

The present invention provides a method of introgressing SCN resistance or partial SCN 
^ resistance into a soybean plant comprising: performing marker assisted selection of the soybean 

plant with a nucleic acid marker, wherein the nucleic acid marker specifically hybridizes with a 
20 nucleic acid molecule having a first nucleic acid sequence that is physically linked to a second 

nucleic acid sequence that is located on linkage group G of soybean A3244, wherein the second 

nucleic acid sequence is within 500 kb of a third nucleic acid sequence which is capable of 

specifically hybridizing with the nucleic acid sequence of SEQ ID NO: 5, 6, complements 

thereof, or fragments thereof; and, selecting the soybean plant based on the marker assisted 
25 selection. 

The present invention provides a method of introgressing SCN resistance or partial SCN 
resistance into a soybean plant comprising: performing marker assisted selection of the soybean 
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plant with a nucleic acid marker, wherein the nucleic acid marker specifically hybridizes with a 
nucleic acid molecule having a first nucleic acid sequence that is physicaUy Unked to a second 
nucleic acid sequence that is located on linkage group A2 of soybean A3244, wherein the second 
nucleic acid sequence is within 500 kb of a third nucleic acid sequence which is capable of 
5 specifically hybridizing with the nucleic acid sequence of SEQ ID NO: 7, complements thereor, 
or fragments thereof; and, selecting the soybean plant based on the marker assisted selection. 
Marker assisted introgression of traits into plants has been reported. Marker assisted 
introgression involves the transfer of a chromosome region defined by one or more markers from 
one germplasm to a second germplasm. In a preferred embodiment the introgression is carried 
10 out by backcrossing with an rhgl or Rhg4 SCN resistant soybean recurrent parent. 
% In light of the current disclosure, plant introductions and geraiplasm can be screened with 

m a marker nucleic acid molecule of the present invention to screen for alleles of rhgl or RhgA 
J using one or more of techniques disclosed herein or known in the art. 

The present invention also provides for parts of the plants produced by a method of the 
£5 present invention. Plant parts, without limitation, include seed, endosperm, ovule and pollen. In 

a particularly preferred embodiment of the present invention, the plant part is a seed. 
W . Plants or parts thereof produced by a method of the present invention may be grown in 

C culture and regenerated. Methods for the regeneration of soybean plants from various tissue 

types and methods for the tissue culture of soybean are known in the art (See, for example, 
20 WidhoLm et al, In Vitro Selection and Culture-induced Variation in Soybean, In Soybean: 
Genetics, Molecular Biology and Biotechnology, Eds. Verma and Shoemaker, CAB 
International, Wallingford, Oxon, England (1996)). Regeneration techniques for plants such as 
soybean can use as the starting material a variety of tissue or cell types. With soybean in 
particular, regeneration processes have been developed that begin with certain differentiated 
25 tissue types such as meristems, Cartha et al, Can. J. Bot. 59:1671-1679 (1981), hypocotyl 
sections, Cameya et al. Plant Science Letters 21: 289-294 (1981), and stem node segments, 
Saka et al. Plant Science Letters, 19: 193-201 (1980); Cheng et al, Plant Science Letters, 19: 
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91-99 (1980). Regeneration of whole sexually mature soybean plants from somatic embryos 
generated from explants of immature soybean embryos has been reported (Ranch et al, In Vitro 
Cellular & Developmental Biology 21: 653-658 (1985). Regeneration of mature soybean plants 
from tissue culture by organogenesis and embryogenesis has also been reported (Barwale et al, 
5 Planta 167: 473-481 (1986); Wright et aU Plant Cell Reports 5: 150-154 (1986)). 
Agents 

One skilled in the art can refer to general reference texts for detailed descriptions of 
known techniques discussed herein or equivalent techniques. These texts include Current 
Protocols in Molecular Biology Ausubel, et aLy eds., John Wiley & Sons, N. Y. (1989), and 
10 supplements through September (1998), Molecular Cloning, A Laboratory Manual, Sambrook et 
al 2^^ Ed., Cold Spring Harbor Press, Cold Spring Harbor, New York (1989), Genome Analysis: 
iff A Laboratory Manual 1: Analyzing DMA, Birren et al, Cold Spring Harbor Press, Cold Spring 
p Harbor, New York (1997); Genome Analysis: A Laboratory Manual 2: Detecting Genes, Birren 
f!] et al, Cold Spring Harbor Press, Cold Spring Harbor, New York (1998); Genome Analysis: A 
1| Laboratory Manual 3: Cloning Systems, Birren et al, Cold Spring Harbor Press, Cold Spring 
^ Harbor, New York (1999); Genome Analysis: A Laboratory Manual 4: Mapping Genomes, 
IfJ Birren et al. Cold Spring Harbor Press, Cold Spring Harbor, New York (1999); Plant Molecular 
Biology: A Laboratory Manual, Clark, Springer-Veriag, Beriin, (1997), Methods in Plant 
Molecular Biology, Maliga et al, Cold Spring Harbor Press, Cold Spring Harbor, New York 
20 (1995). These texts can, of course, also be referred to in making or using an aspect of the 

invention. It is understood that any of the agents of the invention can be substantially purified 
and/or be biologically active and/or recombinant, 
(a) Nucleic Acid Molecules 

Nucleic acid molecules of the present invention include, without limitation, nucleic acid 
25 molecules having a nucleic acid sequence selected from the group consisting of SEQ JD NOs: 1- 
1096 and complements thereof. A subset of the nucleic acid molecules of the present invention 
includes nucleic acid molecules that encode a protein or fragment thereof. Another subset of the 
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nucleic acid molecules of the present invention are cDNA molecules. Another subset of the 
nucleic acid molecules of the present invention includes nucleic acid molecules that are marker 
molecules. A further subset of the nucleic acid molecules of the present invention are those 
nucleic acid molecules having promoter sequences. 
5 Fragment nucleic acid molecules may comprise significant portion(s) of, or indeed most 

of, these nucleic acid molecules. In preferred embodiments, the fragments may comprise smaller 
polynucleotides, e.g., oligonucleotides having from about 20 to about 250 nucleotide residues 
and more preferably, about 20 to about 100 nucleotide residues, or about 40 to about 60 
nucleotide residues. In another preferred embodiment, fragment molecules may be at least 15 
10 nucleotides, at least 30 nucleotides, at least 50 nucleotides, or at least 100 nucleotides. 
5 The term "substantially purified," as used herein, refers to a molecule separated from 

III substantially all other molecules normally associated with it in its native state. More preferably a 
.4^ substantially purified molecule is the predominant species present in a preparation. A 
'Zl substantially purified molecule may be greater than 60% free, preferably 75% free, more 
13 preferably 90% free, and most preferably 95% free from the other molecules (exclusive of 
2 solvent) present in the natural mdxture. The term "substantially purified" is not intended to 
W encompass molecules present in their native state. 

1==^ The agents of the present invention will preferably be "biologically active" with respect 

to either a structural attribute, such as the capacity of a nucleic acid to hybridize to another 

20 nucleic acid molecule, or the abiUty of a protein to be bound by an antibody (or to compete with 
another molecule for such binding). Alternatively, such an attribute may be catalytic and thus 
involve the capacity of the agent to mediate a chemical reaction or response. 

The agents of the present invention may also be recombinant. As used herein, the term 
recombinant describes (a) nucleic acid molecules that are constructed or modified outside of 

25 cells and that can replicate or function in a living cell, (b) molecules that result from the 

transcription, rephcation or translation of recombinant nucleic acid molecules, or (c) organisms 
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that contain recombinant nucleic acid molecules or are modified using recombinant nucleic acid 
molecules. 

It is understood that the agents of the present invention may be labeled with reagents that 
facilitate detection of the agent, e.g., fluorescent labels, (Prober et aly Science 255:336-340 
5 (1987); Albarella et aU EP 144914), chemical labels, (Sheldon et al, U.S. Patent 4,582,789; 
Albarella et al, U.S. Patent 4,563,417), and modified bases, (Miyoshi et aU EP 119448) 
including nucleotides with radioactive elements, e.g., ^^P, ^^P, ^^S or ^^^I, such as ^^P dCTP. 

It is further understood, that the present invention provides recombinant bacterial, animal, 
fungal and plant cells and viral constructs comprising the agents of the present invention . 
10 Nucleic acid molecules or fragments thereof of the present invention are capable of 

^ Specifically hybridizing to other nucleic acid molecules under certain circumstances. As used 
In herein, two nucleic acid molecules are said to be capable of specifically hybridizing to one 
05 another if the two molecules are capable of forming an anti-parallel, double-stranded nucleic acid 

structure. A nucleic acid molecule is said to be the "complement" of another nucleic acid 
%5 molecule if they exhibit "complete complementarity," i.e., each nucleotide in one sequence is 
complementary to its base pairing partner nucleotide in another sequence. Two molecules are 
£J said to be "minimally complementary" if they can hybridize to one another with sufficient 
H stabihty to permit them to remain annealed to one another under at least conventional "low- 
stringency" conditions. Similarly, the molecules are said to be "complementary" if they can 
20 hybridize to one another with sufficient stabihty to permit them to remain annealed to one 
another under conventional "high-stringency" conditions. Nucleic acid molecules which 
hybridize to other nucleic acid molecules, e.g., at least under low stringency conditions are said 
to be "hybridizable cognates" of the other nucleic acid molecules. Conventional stringency 
conditions are described by Sambrook et al. Molecular Cloning, A Laboratory Manual, 2nd Ed., 
25 Cold Spring Harbor Press, Cold Spring Harbor, New York (1989) and by Haymes et al. Nucleic 
Acid Hybridization, A Practical Approach, IRL Press, Washington, DC (1985). Departures from 
complete complementarity are therefore permissible, as long as such departures do not 
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completely preclude the capacity of the molecules to form a double-stranded structure. Thus, in 
order for a nucleic acid molecule to serve as a primer or probe it need only be sufficiently 
complementary in sequence to be able to form a stable double-stranded structure under the 
particular solvent and salt concentrations employed. 
5 Appropriate stringency conditions which promote DNA hybridization, for example, 6.0 X 

sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0 X SSC at 50°C, 
are known to those skilled in the art or can be found in Current Protocols in Molecular Biology, 
John Wiley & Sons, N,Y. (1989), 6.3.1-6,3.6. For example, the salt concentration in the wash 
step can be selected from a low stringency of about 2.0 X SSC at 50°C to a high stringency of 
10 about 0.2 X SSC at SO^'C. In addition, the temperature in the wash step can be increased from 
m low stringency conditions at room temperature, about 22°C, to high stringency conditions at 
|ri about 65°C. Both temperature and salt may be varied, or either the temperature or the salt 
concentration may be held constant while the other variable is changed. 
I In a preferred embodiment, a nucleic acid of the present invention will specifically 

JL5 hybridize to one or more of the nucleic acid molecules set forth in SEQ ID NO: 1 through SEQ 
t! ID NO: 1096 or complements thereof under moderately stringent conditions, for example at 
2 about 2.0 X SSC and about 65°C. 

H\ In a particularly preferred embodiment, a nucleic acid of the present invention will 

include those nucleic acid molecules that specifically hybridize to one or more of the nucleic 

20 acid molecules set forth in SEQ ID NO: 1 through SEQ ID NO: 1096 or complements thereof 
under high stringency conditions such as 0.2 X SSC and about 65°C. 

In one aspect of the present invention, the nucleic acid molecules of the present invention 
comprise one or more of the nucleic acid sequences set forth in SEQ ID NO: 1 through SEQ ID 
NO: 1096 or complements thereof or fragments of either. In another aspect of the present 

25 invention, one or more of the nucleic acid molecules of the present invention share at least 60% 
sequence identity with one or more of the nucleic acid sequences set forth in SEQ ID NO: 1 
through SEQ ID NO: 1096 or complements thereof or fragments of either. In a further aspect of 
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the present invention, one or more of the nucleic acid molecules of the present invention share at 
least 70% or more, e,g., at least 80%, sequence identity with one or more of the nucleic acid 
sequences set forth in SEQ ID NO: 1 through SEQ ID NO: 1096 or complements thereof or 
fragments of either. In a more preferred aspect of the present invention, one or more of the 
5 nucleic acid molecules of the present invention share at least 90% or more, e,g., at least 95% and 
up to 100% sequence identity with one or more of the nucleic acid sequences set forth in SEQ ID 
NO: 1 through SEQ ID NO: 1096 complements thereof or fragments of either. 

As used herein "sequence identity" refers to the extent to which two optimally aligned 
polynucleotide or peptide sequences are invariant throughout a window of alignment of 
10 components, e.g., nucleotides or amino acids. An "identity fraction" for aligned segments of a 
di test sequence and a reference sequence is the number of identical components which are shared 
yi by the two aligned sequences divided by the total number of components in reference sequence 
fQ, segment, Le., the entire reference sequence or a smaller defined part of the reference sequence, 
T^l "Percent identity" is the identity fraction times 100. 

%$ Useful methods for determining sequence identity are disclosed in Guide to Huge 

;^ Computers, Martin J. Bishop, ed., Academic Press, San Diego, 1994, and Carillo, H., and Lipton, 
D., SIAM J Applied Math (1988) 48: 1073. More particularly, preferred computer programs for 
determining sequence identity include the Basic Local Ahgnment Search Tool (BLAST) 
programs which are publicly available from National Center Biotechnology Information (NCBI) 

20 at the National Library of Medicine, National Institute of Health, Bethesda, Md. 20894; see 
BLAST Manual, Altschul et al, NCBI, NLM, NIH; Altschul et al, /. Mol Biol 2J5:403-410 
(1990); version 2.0 or higher of BLAST programs allows the introduction of gaps (deletions and 
insertions) into alignments; BLASTX can be used to determine sequence identity between a 
polynucleotide sequence query and a protein sequence database; and, BLASTN can be used to 

25 determine sequence identity between between sequences. 

For purposes of this invention "percent identity" shall be determined using BLASTX 
version 2.0.14 (default parameters), BLASTN version 2.0.14, or BLASTP 2.0.14. 
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A particularly preferred group of nucleic acid sequences are those present in the soybean 
insert of the clones set forth in table 6 below. 



TABLE 6 



Line 


Names of Clones Containing the Specified Gene 


Rhg4 


rhgl/frag 1 


rhgl/frag 2 


Forrest 


Forrest 1 


Forrest 7 


Forrestl3 


Peking 


Peking 1 


Peking 7 


Peking 13 


Pickett 


Pickett 1 


Pickett 7 


Pickett 13 


PI84751 


PI8475L1 


PI 84751.7 


PI 84751.13 


PI87631 


PI 87631.1 


PI 87631.7 


PI 87631.13 


PI87631-1 


PI 87631-1.1 




PI 87631-1.13 


PI88788R 


PI88788R.1 


PI 88788R.7 


PI88788R.13 


PI89772 






PI 89772.13 


PI90763 




PI 90763.7 


PI 90763.13 


PI200499 


PI 200499.1 


PI 200499.7 


PI 200499.13 


PI209332 


PI 209332.1 




PI 209332.13 


PI404166 


PI 404166.1 


PI 404166.7 


PI 404166. 13 


PI404198A 




PI404198A.7 


PI404198A.13 


PI404198B 


PI404198B.1 


PI404198B.7 


PI404198B.13 


PI437654 


PI 437654.1 


PI 437654.7 


PI 437654.13 


PI438489B 


PI 438489.1 


PI 438489.7 


PI438489B.13 


PI467312 


PI 467312.1 


PI 467312.7 


PI 467312.13 


PI507354 


PI 507354.1 


PI 507354.7 


PI 507354.13 


PI423871 


PI 423871.1 


PI 423871.7 


PI 423871.13 


PI407922 




PI 407922.7 


PI 407922. 13 


PD60843 


PI 360843.1 


PI 360843.7 


PI 360843.13 


A2869 


A2869.1 


A 2869.7 


A2869.13 


A2069 


A2069.1 




A2069.13 


Jack 


JACKl 1 




JACKl 3 


Will 


WILLI 


WILL.7 


WILLI 3 


Minsoy 


Minsoy 1 


Minsoy.7 


MEvrSOY13 


Noir 


Noirl 


Noir.7 


NOIR13 


Hutcheson 


Hutcheson 1 


Hutcheson.7 


Hutcheson. 13 


A1923 


A1923.1 


A1923.7 


A1923.13 
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Line 


Names of Clones Containing the Specified Gene 


Rhg4 


rhgl/frag 1 


rhgl/irag.2 


A2704 




A2704.7 


A2704.13 


Essex 


Essex 1 


Essex.7 


ESSEX13 


A3244 


A3244.1 


A 3244.7 


A3244.13 


Lee74 


Lee74.1 


Lee74.7 


Lee74.13 


PI437654 




R107C17.7 


R107C17.13 



Table 5 shows clones comprising rhgl and Rhg4 sequences. The "Lines" column 
indicates the cultivar from which the sequence in the clone is derived. The Rhg4, rhgl/fmgl, 
and rhgl/fmgl columns show the clones derived from the lines that have the Rhg4, rhgl 
fragment 1, or rhgl fragment 2, respectively. Rhg4 is amplified with SEQ ID NOs: 48 and 49, 
which produces a 3.5 kb product, rhgl fragment 1 is amplified with SEQ ID NOs: 24 and 25, 
which produces a 2.9 kb product, and rhgl fragment 2 is amplified with SEQ ID NOs: 26 and 
27, which produces a 1.75 kb product. All fragments are subcloned into a pCR4-T0P0 vector. 

(i) Nucleic Acid Molecules Encoding Proteins or Fragments Thereof 

A) rhgl 

The present invention includes nucleic acid molecules that code for an rhgl protein or 
fragment thereof. Examples of such nucleic acid molecules include those that code for the 
proteins set forth in SEQ ID NOs: 1097, 1100, 1098, 1101, and 1102-1115. Examples of 
illustrative fragment molecules include, without limitation, an extracellular LRR domain (rhgl, 
V.7, SEQ ID NO: 1097, residues 164-457; rhgl, v.2, SEQ ID NO: 1098, residues 141-434), a 
transmembrane domain (rhgl, v.7, SEQ ID NO: 1097, residues 508-530; rhgl, v.2, SEQ ID NO: 
1098, residues 33-51 and 485-507), and an STK domain (rhgl, v.i, SEQ ID NO: 1097, residues 
578-869; rhgl, v.2, SEQ ID NO: 1098, residues 555-846). Examples of illustrative nucleic acid 
molecules include SEQ ID NOs: 5, 6, 8-23, and 28-43. 

B) Rhg4 

The present invention includes nucleic acid molecules that code for an Rhg4 protein or 
fragment thereof. Examples of such nucleic acid molecules include those that code for the 
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proteins set forth in SEQ ID NOs: 1099 and 1116-1119. Examples of illustrative fragment 
molecules include, without limitation, an extracellular LRR domain (SEQ ID NO: 1099, residues 
34-44)^ a transmembrane domain (SEQ ID NO: 1099, residues 449-471), and an STK domain 
(SEQ ID NO: 1099, residues 531-830). Examples of illustrative nucleic acid molecules include 
5 SEQ ID NOs: 7, 44-47, and 50-53. 
C) RhglmdRhgA 

In another further aspect of the present invention, nucleic acid molecules of the present 
invention can comprise sequences which differ from those encoding a protein or fragment 
thereof in SEQ ID NO: 1097 through SEQ ID NO: 1 1 19 due to fact that the different nucleic acid 
10 sequence encodes a protein having one or more conservative amino acid changes. It is 
:iO understood that codons capable of coding for such conservative amino acid substitutions are 
Iff known in the art. 

m It is well known in the art that one or more amino acids in a native sequence can be 

i[l substituted with another amino acid(s), the charge and polarity of which are similar to that of the 
'%§ native amino acid, ie,, a conservative amino acid substitution. Conserved substitutions for an 
il~ amino acid within the native polypeptide sequence can be selected from other members of the 
yj class to which the naturally occurring amino acid belongs. Anuno acids can be divided into the 
^ following four groups: (1) acidic amino acids, (2) basic amino acids, (3) neutral polar amino 
acids, and (4) neutral nonpolar amino acids. Representative amino acids within these various 
20 groups include, but are not hmited to : (1) acidic (negatively charged) amino acids such as 
aspartic acid and glutamic acid; (2) basic (positively charged) amino acids such as arginine, 
histidine, and lysine; (3) neutral polar amino acids such as glycine, serine, threonine, cysteine, 
cystine, tyrosine, asparagine, and glutamine; and (4) neutral nonpolar (hydrophobic) amino acids 
such as alanine, leucine, isoleucine, valine, prohne, phenylalanine, tryptophan, and methionine. 
25 Conservative amino acid changes within the native polypeptides sequence can be made 

by substituting one amino acid within one of these groups with another amino acid within the 
same group. Biologically functional equivalents of the proteins or fragments thereof of the 
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present invention can have ten or fewer conservative amino acid changes, more preferably seven 
or fewer conservative amino acid changes, and most preferably five or fewer conservative amino 
acid changes. The encoding nucleotide sequence will thus have corresponding base 
substitutions, permitting it to encode biologically functional equivalent forms of the proteins or 
fragments of the present invention. 

It is understood that certain amino acids may be substituted for other amino acids in a 
protein structure without appreciable loss of interactive binding capacity with structures such as, 
for example, antigen-binding regions of antibodies or binding sites on substrate molecules. 
Because it is the interactive capacity and nature of a protein that defines that protein's biological 
functional activity, certain amino acid sequence substitutions can be made in a protein sequence 
and, of course, its underlying DNA coding sequence and, nevertheless, obtain a protein with like 
properties. It is thus contemplated by the inventors that various changes may be made in the 
peptide sequences of the proteins or fragments of the present invention, or corresponding DNA 
sequences that encode said peptides, without appreciable loss of their biological utility or 
activity. It is understood that codons capable of coding for such amino acid changes are known 
in the art. 

In making such changes, the hydropathic index of amino acids may be considered. The 
importance of the hydropathic amino acid index in conferring interactive biological function on a 
protein is generally understood in the art (Kyte and Doolittle, J, Mol Biol 157, 105-132 (1982). 
It is accepted that the relative hydropathic character of the amino acid contributes to the 
secondary structure of the resultant protein, which in turn defines the interaction of the protein 
with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, 
and the like. In making such changes, the substitution of amino acids whose hydropathic indices 
are within ±2 is preferred, those which are within ±1 are particularly preferred, and those within 
±0.5 are even more particularly preferred. 

It is also understood in the art that the substitution of like amino acids can be made 
effectively on the basis of hydrophilicity. U.S. Patent 4,554,101, states that the greatest local 
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average hydrophilicity of a protein, as govern by the hydrophilicity of its adjacent amino acids, 
correlates with a biological property of the protein. In a further aspect of the present invention, 
one or more of the nucleic acid molecules of the present invention differ in nucleic acid sequence 
from those encoding a peptide set forth in SEQ ID NO: 1097 through SEQ ID NO: 1119 or 
fragment thereof due to the fact that one or more codons encoding an amino acid has been 
substituted for a codon that encodes a nonessential substitution of the amino acid originally 
encoded. 

Agents of the invention include nucleic acid molecules that encode at least about a 
contiguous 10 amino acid region of a protein of the present invention, more preferably at least 
about a contiguous 1 1 to 14 or larger amino acid region of a protein of the present invention. It 
is understood that the present invention includes nucleic acid molecules that specifically 
hybridize or exhibit a particular identity to the nucleic acid molecules described in (i). See (a) 
above. 

(ii) Nucleic Acid Molecule Markers and Collections of Such Molecules 

One aspect of the present invention concerns nucleic acid molecules of the present inven- 
tion that can act as markers. As used herein, a "marker" is an indicator for the presence of at 
least one phenotype or polymorphism, such as single nucleotide polymorphisms (SNPs), 
cleaveable amplified polymorphic sequences (CAPS), amplified fragment length polymorphisms 
(AFLPs), restriction fragment length polymorphisms (RFLPs), simple sequence repeats (SSRs), 
or random amplified polymorphic DNA (RAPDs). A "nucleic acid marker" as used herein 
means a nucleic acid molecule that is capable of being a marker for detecting a polymorphism or 
phenotype. 

In one embodiment of the present invention, the nucleic acid marker specifically 
hybridizes to a nucleic acid molecule having a nucleic acid sequence selected from the group 
SEQ NOs; 1-1096 and complements thereof. In a preferred embodiment, the nucleic acid marker 
is capable of detecting an rhgl SNP or INDEL set forth in table 2. In a preferred embodiment, 
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the nucleic acid marker is capable of detecting an Rgh4 SNP or INDEL set forth in table 4. In 
another preferred embodiment the nucleic acid marker is a nucleic acid molecule capable of 
acting as a PGR primer to amplify an rhgl or Rhg4 coding region. Examples of such primers 
include, without limitation, nucleic acid molecules having a nucleic acid sequence set forth in 
5 SEQ ID NO: 401 -1096 and complements thereof. Such primers can be used in pairs to ampUfy 
a region (amplicons, e.g., without limitation, SEQ ID NOs: 54-400) that can be further 
investigated using techniques known in the art such as nucleic acid sequencing. Preferred pairs 
are those with identical "Seq ID" {see Description of the Sequence Listing) except for the fact 
that one "Seq ID" recites forward primer and one recites reverse primer. 
10 In another embodiment of the present invention, the nucleic acid marker specifically 

hybridizes to a nucleic acid molecule having a sequence that is present on linkage group G 
yi within 500 kb or lOOkb, more preferably within 50kb, even more preferably within 25kb of an 
rij rhgl allele, where the Rgh4 allele is preferably a sensitive allele, and more preferably a sensitive 
M allele from A3 244. In a preferred embodiment the nucleic acid marker specifically hybridizes to 
%$ a nucleic acid molecule having a sequence that is present on linkage group A2 within 500kb or 
lOOkb, more preferably within 50kb, even more preferably within 25kb of an Rhg4 allele, where 
the Rgh4 allele is preferably a sensitive allele, and more preferably a sensitive allele from 
^ A3244. 

As used herein, a "collection of nucleic acid molecules" is a population of nucleic acid 
20 molecules where at least two, preferably all, of the nucleic acid molecules differ, at least in part, 
in their nucleic acid sequence. It is understood, that as used herein, an individual species within 
a collection of nucleic acid molecules may be physically separate or alternatively not physically 
separate from one or more other species within the collection of nucleic acid molecules. An 
example of a situation where individual species may be physically separate but considered a 
25 collection of nucleic acid molecules is where more than two species are present in a single 
location such as an array. 



84 



04983,0216.NPUS01/38-10(15810)B 



As used herein, where a collection of nucleic acid molecules is a marker for a particular 
attribute, the level, pattern, occurrence and/or absence of the nucleic acid molecules associated 
with the attribute are not required to be the same between species of the collection. For example, 
the increase in the level of a species when in combination with the decrease in a second species 
5 could be diagnostic for a particular attribute. In a preferred embodiment of the present invention, 
the level, pattern, occurrence and/or absence of a nucleic acid molecule and/or collection of 
nucleic acid molecules of the present invention is a marker for SCN resistance. 

In one embodiment, the marker is any nucleic acid molecule that specifically hybridizes 
to any nucleic acid sequence set forth herein. In another embodiment, the marker is a marker 
10 capable of distinguishing among the haplotypes of either rhgl or Rhg4, In yet another 
fi' embodiment, more than one marker is used to simultaneously distinguish more than one 
111 haplotype. In a preferred embodiment, two, three, four, six, eight, twenty five or fifty or more 
h nucleic acid markers are used simultaneously. In another embodiment, one or more markers that 
'^i are capable of distinguishing among the haplotypes of rhgl and one or more markers that are 
15 capable of distinguishing among the haplotypes of Rhg4 are used together. 

(iii) Nucleic acid molecules having promoter sequences and other regulatory 
^ sequences 

The present invention includes nucleic acid molecules that are an rhgl or Rhg4 promoter 
20 or fragment thereof. Examples of such nucleic acid molecules include those set forth in SEQ ID 
NO: 2, upstream of coordinate 45163 and SEQ ID NO: 3, upstream of coordinate 46798. As 
used herein a promoter is a nucleic acid sequence that when joined with a coding region is 
capable of expressing the protein or fragment thereof so encoded. In a preferred embodiment the 
promoter sequence corresponds to between 500 nucleotides and 5,000 nucleotides or between 
25 300 nucleotides and 700 nucleotides of the nucleic acid sequence set forth in SEQ ID NO: 2 

between coordinates 45163 and 40163, or SEQ ID N0:3 between coordinates 46798 and 41798 , 
or the nucleic acid sequence set forth in SEQ ID NO: 4 between coordinates 111805 and 106805 
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Preferred partial promoter regions include the TATA box region, e.g. at coordinates 44234 
through 44246 of SEQ ID NO: 2 and at coordinatesl07826 through 107835 of SEQ ID NO: 4, 
and CAAT box region, e.g. at coordinates 106243 through 106259 of SEQ ID NO: 4. 
Other regulatory sequences include introns or 3' untranslated regions (3'UTRs) 

5 associated with rhgl and Rhg4. In a preferred embodiment, an intron is selected from a nucleic 
acid comprising a sequence selected from SEQ ID NO: 2 (rhgl v.l at coordinates 45315-45449, 
45510-46940, and 48764-48974), SEQ ID NO: 3 {rhgl v.l at coordinates 48764-48974) and 
SEQ ID NO: 4 (RhgA at coordinates 113969-114683). In another preferred embodiment, a 
3'UTR is located within 5,000 nucleotides, more preferable within 1000 nucleotides in the 3' 

10 direction of the last coding nucleotide of either rhgl or Rhg4 (SEQ ID NO: 2, rhgl v.l, 

% coordinate 49573, SEQ ID NO: 3, rhgl, v.l, coordinate 49573, SEQ ID NO: 4, RhgA, coordinate 

m 115204). 

% It is understood that the present invention includes nucleic acid molecules that 

f; specifically hybridize or exhibit a particular identity to the nucleic acid molecules described in 

pL5 (iii). See (a) above. 

^ (b) Protein and Peptide Molecules 

W A class of agents comprises one or more of the protein or peptide molecules encoded by 

SEQ ID NO: 1097 through SEQ ID NO: 1119 or one or more of the protein or fragment thereof 
or peptide molecules encoded by other nucleic acid agents of the present invention. As used 

20 herein, the term "protein molecule" and "peptide molecule" mean any protein or protein 

fragment or peptide or polypeptide molecule that comprises ten or more amino acids, preferably 
at least 1 1 or 12 or more, more preferably at least 13 or 14 amino acids. It is well know in the art 
that proteins may undergo modification, including post-translational modifications, such as, but 
not limited to, disulfide bond formation, glycosylation, phosphorylation, or oUgomerization. 

25 Thus, as used herein, the terms "protein molecule" and "peptide molecule" include molecules 
that are modified by any biological or non-biological process. The terms "amino acid" and 
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"amino acids" refer to all naturally occurring L-amino acids. This definition is meant to include 

norleucine, ornithine, homocysteine, and homoserine. 

One or more of the protein or peptide molecules may be produced via chemical synthesis, 

or more preferably, by expression in a suitable bacterial or eukaryotic host. Suitable methods for 
5 expression are described by Sambrook, et a/., (In: Molecular Cloning, A Laboratory Manual, 

2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, New York (1989), or similar texts. 
Another class of agents comprise protein or peptide molecules encoded by SEQ ID NO: 

1097 through SEQ ID NO: 1119 or complements thereof or, fragments or fusions thereof in 

which non-essential, or not relevant, amino acid residues have been added, replaced, or deleted, 
10 An example of such a homolog is a protein homolog of each soybean species, including but not 
^ limited to alfalfa, barley, Brassica, broccoh, cabbage, citrus, garlic, oat, oilseed rape, onion, 
m canola, flax, pea, peanut, pepper, potato, rye, soybean, strawberry, sugarcane, sugarbeet, 
S soybean, maize, rice, cotton, sorghum, Arabidopsis, wheat, pine, fir, eucalyptus, apple, lettuce, 
{1; peas, lentils, grape, banana, tea, turf grasses, etc. Particularly preferred non- soybean plants to 
15 utilize for the isolation of homologs would include alfalfa, barley, oat, oilseed rape, canola, 
^ ornamentals, sugarcane, sugarbeet, soybean, maize, rice, cotton, sorghum, Arabidopsis, wheat, 
yj potato, and turf grasses. Such a homolog can be obtained by any of a variety of methods. Most 

preferably, as indicated above, one or more of the disclosed sequences (SEQ ID NO: 1 through 

SEQ ID NO: 1096 or complements thereof) will be used to define a pair of primers that may be 
20 used to isolate the protein homolog-encoding nucleic acid molecules from any desired species. 

Such molecules can be expressed to yield protein homologs by recombinant means, 
(c) Plant Constructs and Plant Transformants 

One or more of the nucleic acid molecules of the invention may be used in plant 
transformation or transfection. Exogenous genetic material may be transferred into a plant cell 
25 and the plant cell regenerated into a whole, fertile or sterile plant. Exogenous genetic material is 
any genetic material, whether naturally occurring or otherwise, from any source that is capable of 
being inserted into any organism. In a preferred embodiment the exogenous genetic material 
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includes a nucleic acid molecule of the present invention, preferably a nucleic acid molecule 
having at least 20 nucleotides of a sequence selected from the group consisting of SEQ ED NO: 1 
through SEQ ID NO: 1096 and complements thereof. In a preferred embodiment, the nucleic 
acid molecule codes for a protein or fragment thereof described in Section (i). In another 
5 preferred embodiment, the nucleic acid molecule is a promoter or fragment thereof described in 
Section (iii). 

Such genetic material may be transferred into either monocotyledons and dicotyledons 
including, but not limited to tomato, eggplant, maize, soybean, Arabidopsis, phaseolus, peanut, 
alfalfa, wheat, rice, oat, sorghum, rye, tritordeum, millet, fescue, perennial ryegrass, sugarcane, 
10 cranberry, papaya, banana, banana, muskmelon, apple, cucumber, dendrobium, gladiolus, 
ji chrysanthemum, liliacea, cotton, eucalyptus, sunflower, canola, turfgrass, sugarbeet, coffee and 
m dioscorea (Christou, In: Particle Bombardment for Genetic Engineering of Plants, 
m Biotechnology InteUigence Unit. Academic Press, San Diego, California (1996). 
f l In a preferred embodiment, the genetic material is transferred to a soybean. Preferred 

15 soybeans to transfer an rhgl SCN resistance allele are selected from the group consisting of 
^ PI548402 (Peking), PI200499, A2869, Jack, A2069, PI209332 (No:4), PI404166 
in (Krasnoaarmejkaja), PI404198 (Sun huan do), PI437654 (Er-hej-jan), PI438489 (Chiquita), 
PI507354 (Tokei 421), PI548655 (Forrest), PI548988 (Pickett), PI84751, PI437654, PI40792, 
Pyramid, Nathan, AG2201, A3469, AG3901, A3904, AG4301, AG4401, AG4501, AG4601, 
20 PION9492, PI88788, Dyer, Custer, Manokin, and Doles. 

Preferred soybeans to transfer an RhgA SCN resistance allele are selected from the group 
consisting of PI548402 (Peking), PI437654 (Er-hej-jan), PI438489 (Chiquita), PI507354 (Tokei 
421), PI548655 (Forrest), PI548988 (Pickett), PI88788, PI404198 (Sun Huan Do), PI404166 
(Krasnoaarmejkaja), Hartwig, Manokin, Doles, Dyer, and Custer. 
25 Transfer of a nucleic acid that encodes for a protein can result in overexpression of that 

protein in a transformed cell or transgenic plant. One or more of the proteins or fragments 
thereof encoded by nucleic acid molecules of the invention may be overexpressed in a 
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transformed cell or transformed plant. Such overexpression may be the result of transient or 
stable transfer of the exogenous genetic material. Such overexpression can also result in SCN 
resistance to one or more races of SCN. 

Exogenous genetic material may be transferred into a host cell by the use of a DNA 
5 vector or construct designed for such a purpose. Design of such a vector is generally within the 
skill of the art (See, Plant Molecular Biology: A Laboratory Manual, Clark (ed.), Springier, New 
York (1997). 

A construct or vector may include a plant promoter to express the protein or protein 
fragment of choice. A number of promoters, which are active in plant cells, have been described 
10 in the literature. These include the nopaline synthase (NOS) promoter (Ebert et aL, Proa Natl 
5 Acad, Set (USA,) S4:5745-5749 (1987), the octopine synthase (OCS) promoter (which are 
yl carried on tumor-inducing plasmids of Agrobacterium tumefaciens), the cauhmovirus promoters 

such as the cauliflower mosaic virus (CaMV) 19S promoter (Lawton et ah. Plant MoL Biol 
}; j 9:315-324 (1987), and the CaMV 35S pramoter (Odell et a/.. Nature 573:810-812 (1985), the 
%5 figwort mosaic virus 35S-promoter, the light-inducible promoter from the small subunit of 
^ ribulose-l,5-bis~phosphate carboxylase (ssRUBISCO), the Adh promoter (Walker et al, Proc, 
^ Natl Acad Scl (USA,) 54:6624-6628 (1987), the sucrose synthase promoter (Yang et al, Proc. 
¥^ Natl Acad, Scl (U,S,A,) S7:4144-4148 (1990), the R gene complex promoter (Chandler et al, 
The Plant Cell 7:1175-1183 (1989), and the chlorophyll a/b binding protein gene promoter, etc. 
20 These promoters have been used to create DNA constructs that have been expressed in plants; 
see, e,g„ PCT publication WO 84/02913. The CaMV 35S promoters are preferred for use in 
plants. Promoters known or found to cause transcription of DNA in plant cells can be used in the 
invention. 

For the purpose of expression in source tissues of the plant, such as the leaf, seed, root or 
25 stem, it is preferred that the promoters utilized have relatively high expression in these specific 
tissues. Tissue-specific expression of a protein of the present invention is a particularly preferred 
embodiment. For this purpose, one may choose from a number of promoters for genes with 
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tissue- or cell-specific or -enhanced expression. Examples of such promoters reported in the 
literature include the chloroplast glutamine synthetase GS2 promoter from pea (Edwards et ah, 
Proc. Natl Acad, Set (USA,) S7:3459-3463 (1990), the chloroplast fructose- 1,6-biphosphatase 
(FBPase) promoter from wheat (Lloyd et aU Mol Gen. Genet, 225:209-216 (1991), the nuclear 
5 photosynthetic ST-LSl promoter from potato (Stockhaus et al„ EMBO J, 5:2445-2451 (1989), 
the STK (PAL) promoter and the glucoamylase (CHS) promoter from Arabidopsis thaliana. 
Also reported to be active in photosynthetically active tissues are the ribu]ose-l,5-bisphosphate 
carboxylase (RbcS) promoter from eastern larch {Larix laricina), the promoter for the cab gene, 
cab6, from pine (Yamamoto et at. Plant Cell Physiol 35:773-778 (1994), the promoter for the 
10 Cab-1 gene from wheat (Fejes et al, Plant Mol Biol 75:921-932 (1990), the promoter for the 
# CAB-1 gene from spinach (Lubberstedt et al, Plant Physiol 104:997-1006 (1994), the promoter 
01 for the cablR gene from rice (Luan et al. Plant Cell 4:971-981 (1992), the pyruvate, 
[B orthophosphate dikinase (PPDK) promoter from maize (Matsuoka et al, Proc, Natl Acad, Sci 
U (U,SA,) 90: 9586-9590 (1993), the promoter for the tobacco Lhcbl *2 gene (Cerdan et al, Plant 
1^ Mol Biol 33:245-255 (1997), tht Arabidopsis thaliana SUC2 sucrose-H+ symporter promoter 
LI (Truemit et al, Planta, 795:564-570 (1995), and the promoter for the thylakoid membrane 

proteins from spinach (psaD, psaF, psaE, PC, FNR, atpC, atpD, cab, rbcS). Other promoters for 
^ the chlorophyll a/b-binding proteins may also be utilized in the invention, such as the promoters 
for LhcB gene and PsbP gene from white mustard (Sinapis alha\ Kretsch et al. Plant Mol Biol 
20 2S:219-229 (1995)). 

For the purpose of expression in sink tissues of the plant, such as the tuber of the potato 
plant, the fruit of tomato, or the seed of maize, wheat, rice and barley, it is preferred that the 
promoters utilized in the invention have relatively high expression in these specific tissues. A 
number of promoters for genes with tuber-specific or -enhanced expression are known, including 
25 the class I patatin promoter (Bevan et al, EMBO J, 8:1899-1906 (1986); Jefferson et al, Plant 
Mol Biol 7^:995-1006 (1990)), the promoter for the potato tuber ADPGPP genes, both the large 
and small subunits, the sucrose synthase promoter (Salanoubat and Belliard, Gene 60:47-56 
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(1987) , Salanoubat and Belliard, Gene S4:181-185 (1989)), the promoter for the major tuber 
proteins including the 22 kd protein complexes and proteinase inhibitors (Hannapel, Plant 
Physiol 707:703-704 (1993)), the promoter for the granule bound starch synthase gene (GBSS) 
(Visser et aL, Plant MoL Biol 77:691-699 (1991)), and other class I and 11 patatins promoters 

5 (Koster-Topfer et a/., Mol Gen Genet, 279:390-396 (1989); Mignery et al. Gene, 62:21 -AA 

(1988) ), 

Other promoters can also be used to express a protein or fragment thereof in specific 
tissues, such as seeds or fruits. The promoter for P-conglycinin (Chen et ah, Dev, Genet, 10: 
112-122 (1989)) or other seed-specific promoters such as the napin and phaseolin promoters, can 
10 be used. The zeins are a group of storage proteins found in maize endosperm. Genomic clones 
for zein genes have been isolated (Pedersen et al, Cell 29: 1015-1026 (1982)) and the promoters 

"Kg. 

m from these clones, including the 15 kD, 16 kD, 19 kD, 22 kD, 27 kD and genes, could also be 
m used. Other promoters known to function, for example, in maize include the promoters for the 
rj following genes: waxy, Brittle, Shrunken 2, Branching enzymes I and H, starch synthases, 
%S debranching enzymes, oleosins, glutelins and sucrose synthases. A particularly preferred 
^ promoter for maize endosperm expression is the promoter for the gluteUn gene from rice, more 
2 particularly the Osgt-1 promoter (Zheng et al., Mol Cell Biol 75:5829-5842 (1993)). Examples 
^ of promoters suitable for expression in wheat include those promoters for the ADPglucose 

pyrosynthase (ADPGPP) subunits, the granule bound and other starch synthase, the branching 
20 and debranching enzymes, the embryogenesis-abundant proteins, the ghadins and the glutenins. 
Examples of such promoters in rice include those promoters for the ADPGPP subunits, the 
granule bound and other starch synthase, the branching enzymes, the debranching enzymes, 
sucrose synthases and the gluteUns. A particularly preferred promoter is the promoter for rice 
glutelin, Osgt-1. Examples of such promoters for barley include those for the ADPGPP subunits, 
25 the granule bound and other starch synthase, the branching enzymes, the debranching enzymes, 
sucrose synthases, the hordeins, the embryo globuUns and the aleurone specific proteins. 
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Root specific promoters may also be used. An example of such a promoter is the 
promoter for the acid chitinase gene (Samac et ah. Plant Mot Biol 25:587-596 (1994)). 
Expression in root tissue could also be accomplished by utilizing the root specific subdomains of 
the CaMV35S promoter that have been identified (Lam et aL, Proa Natl Acad ScL (U,SA.) 
5 56:7890-7894 (1989)). Other root cell specific promoters include those reported by Conkling et 
al (Conkling et aU Plant Physiol 93:1203-1211 (1990)). 

Additional promoters that may be utilized are described, for example, in U.S. Patent Nos. 
5,378,619; 5,391,725; 5,428,147; 5,447,858; 5,608,144; 5,608,144; 5,614,399; 5,633,441; 
5,633,435; and 4,633,436, In addition, a tissue specific enhancer may be used (Fromm et al, 
10 The Plant Cell 1:911 (1989)). 

:fi Preferred promoters are those set forth in Section (a)(iii) of Agents. 

iff Constructs or vectors may also include, with the coding region of interest, a nucleic acid 

in sequence that acts, in whole or in part, to terminate transcription of that region, A number of 

f!j such sequences have been isolated, including the Tr7 3' sequence and the NOS 3' sequence 

!15 (Ingelbrecht et al. The Plant Cell 7:671-680 (1989); Bevan et al, Nucleic Acids Res. ii:369-385 

tj (1983)). 

^ A vector or construct may also include regulatory elements. Examples of such include 

H the Adh intron 1 (Calhs et al. Genes and Develop, 7:1183-1200 (1987)), the sucrose synthase 

intron (Vasil et al, Plant Physiol 97:1575-1579 (1989)) and the TMV omega element (Gallic et 
20 al, The Plant Cell 7:301-31 1 (1989)). These and other regulatory elements may be included 

when appropriate. 

A vector or construct may also include a selectable marker. Selectable markers may also 
be used to select for plants or plant cells that contain the exogenous genetic material. Examples 
of such include, but are not limited to: a neomycin phosphotransferase gene (U.S. Patent 
25 5,034,322), which codes for kanamycin resistance and can be selected for using kanamycin, 
G418, etc.; a bar gene which codes for bialaphos resistance; genes which encode glyphosate 
resistance (U.S. Patents 4,940,835; 5,188,642; 4,971,908; 5,627,061); a nitrilase gene which 
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confers resistance to bromoxynil (Stalker et aL, /. Biol Chem. 265:6310-6314 (1988)); a mutant 
acetolactate synthase gene (ALS) which confers imidazolinone or sulphonylurea resistance 
(European Patent Apphcation 154,204 (Sept. 11, 1985)); and a methotrexate resistant DHFR 
gene (Thillet etaL, J. Biol Chem. 265:12500-12508 (1988)). 
5 A vector or construct may also include DNA sequence which encodes a transit peptide. 

Incorporation of a suitable chloroplast transit peptide may also be employed (European Patent 
Apphcation Publication Number 0218571). Translational enhancers may also be incorporated as 
part of the vector DNA. DNA constructs could contain one or more 5' non-translated leader 
sequences which may serve to enhance expression of the gene products from the resulting 
10 mRNA transcripts. Such sequences may be derived from the promoter selected to express the 
rfi g^i^^ or can be specifically modified to increase translation of the mRNA. Such regions may 
IfJ also be obtained from viral RNAs, from suitable eukaryotic genes, or from a synthetic gene 
Q sequence. For a review of optimizing expression of transgenes, see Koziel et al, Plant Mol 
U fi/^?/. 52:393-405 (1996). 

%^ A vector or construct may also include a screenable marker. Screenable markers may be 

^ used to monitor expression. Exemplary screenable markers include: a p -glucuronidase or uidA 
yi gene (GUS) which encodes an enzyme for which various chromogenic substrates are known 
(Jefferson, Plant Mol Biol Rep^ 5:387-405 (1987); Jefferson et aU EMBO J. 6:3901-3907 
(1987)); an R-locus gene, which encodes a product that regulates the production of anthocyanin 
20 pigments (red color) in plant tissues (Dellaporta et ah, Stadler Symposium 77:263-282 (1988)); a 
p-lactamase gene (Sutcliffe et aU Proa Natl Acad Scl (aSA.) 75:3131 -31 A\ (1978)), a gene 
which encodes an enzyme for which various chromogenic substrates are known {e.g,, PADAC, a 
chromogenic cephalosporin); a luciferase gene (Ow et al. Science 254:856-859 (1986)); a xylE 
gene (Zukowsky et al, Proc. Natl Acad Scl (U,S,A) 50:1101-1105 (1983)) which encodes a 
25 catechol dioxygenase that can convert chromogenic catechols; an a-amylase gene (Ikatu et al, 
Bio/Technol 8:241-242 (1990)); a tyrosinase gene (Katz et al, J. Gen. Microbiol 729:2703- 
2714 (1983)) which encodes an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone 
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which in turn condenses to melanin; an a-galactosidase, which will turn a chromogenic a- 
galactose substrate. 

Included within the terms "selectable or screenable marker genes" are also genes which 
encode a secretable marker whose secretion can be detected as a means of identifying or 
5 selecting for transformed cells. Examples include markers which encode a secretable antigen 
that can be identified by antibody interaction, or even secretable enzymes which can be detected 
catalytically. Secretable proteins fall into a number of classes, including small, diffusible 
proteins which are detectable, {e.g., by ELISA), small active enzymes which are detectable in 
extracellular solution (e.g., a-amylase, P-lactamase, phosphinothricin transferase), or proteins 

10 which are inserted or trapped in the cell wall (such as proteins which include a leader sequence 

G 

^ such as that found in the expression unit of extension or tobacco PR-S). Other possible 

Iji selectable and/or screenable marker genes will be apparent to those of skill in the art. 

J There are many methods for introducing transforming nucleic acid molecules into plant 

f I cells. Suitable methods are believed to include virtually any method by which nucleic acid 

%3 molecules may be introduced into a cell, such as by Agrohacterium infection or direct delivery of 

^ nucleic acid molecules such as, for example, by PEG-mediated transformation, by 

yi electroporation or by acceleration of DNA coated particles, etc (Potrykus, Ann, Rev. Plant 

H Physiol Plant MoL Biol 42:205-225 (1991); Vasil, Plant MoL Biol 25:925-937 (1994)). For 

example, electroporation has been used to transform maize protoplasts (Fromm et ah. Nature 
20 572:791-793 (1986)). 

Other vector systems suitable for introducing transforming DNA into a host plant cell 

include but are not limited to binary artificial chromosome (BIBAC) vectors (Hamilton et al, 

Gene 200:107-116 (1997)); and transfection with RNA viral vectors (Della-Cioppa et al, Ann. 

N.Y, Acad. Sci. (1996), 792 (Engineering Plants for Commercial Products and Applications), 57- 
25 61). Additional vector systems also include plant selectable YAC vectors such as those 

described in Mullen et a/., Molecular Breeding 4:449-457 (1988)). 
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Technology for introduction of DNA into cells is well known to those of skill in the art. 
Four general methods for delivering a gene into cells have been described: (1) chemical methods 
(Graham and van der Eb, Virology 54:536-539 (1973)); (2) physical methods such as 
microinjection (Capecchi, Cell 22:479-488 (1980)), electroporation (Wong and Neumann, 
5 Biochem. Biophys. Res. Commun. 707:584-587 (1982); Fromm et al, Proc, Natl Acad Set 
(aSA.) §2:5824-5828 (1985); U.S. Patent No. 5,384,253); and the gene gun (Johnston and 
Tang, Methods Cell Biol 43:353-365 (1994)); (3) viral vectors (Clapp, Clin. Perinatol 20:155- 
168 (1993); Lu et aU Exp, Med, 775:2089-2096 (1993); Eglitis and Anderson, Biotechniques 
(5:608-614 (1988)); and (4) receptor-mediated mechanisms (Curiel et al. Hum, Gen, Ther, 3:147- 
10 154 (1992), Wagner et aU Proc, Natl Acad Set (USA) 59:6099-6103 (1992)). 

Acceleration methods that may be used include, for example, microprojectile 
m bombardment and the like. One example of a method for delivering transforming nucleic acid 
fg molecules to plant cells is microprojectile bombardment. This method has been reviewed by 
Zl Yang and Christou (eds.), Particle Bombardment Technology for Gene Transfer^ Oxford Press, 
5L^ Oxford, England (1994)), Non-biological particles (microprojectiles) that may be coated with 
^ nucleic acids and delivered into cells by a propelling force. Exemplary particles include those 
^ comprised of tungsten, gold, platinum and the like. 

H A particular advantage of microprojectile bombardment, in addition to it being an 

effective means of reproducibly transforming monocots, is that neither the isolation of 

20 protoplasts (Cristou et al. Plant Physiol, 87:611-61 A (1988)) nor the susceptibility of 

Agrobacterium infection are required. An illustrative embodiment of a method for delivering 
DNA into maize cells by acceleration is a biolistics a-particle dehvery system, which can be 
used to propel particles coated with DNA through a screen, such as a stainless steel or Nytex 
screen, onto a filter surface covered with com cells cultured in suspension. Gordon-Kanmi et a/., 

25 describes the basic procedure for coating tungsten particles with DNA (Gordon-Kamm et al. 
Plant Cell 2:603-618 (1990)). The screen disperses the tungsten nucleic acid particles so that 
they are not delivered to the recipient cells in large aggregates. A particle delivery system 
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suitable for use with the invention is the helium acceleration PDS-lOOO/He gun is available from 
Bio-Rad Laboratories (Bio-Rad, Hercules, Califomia)(Sanford etal. Technique 3:3-16 (1991)). 

For the bombardment, cells in suspension may be concentrated on filters. Filters 
containing the cells to be bombarded are positioned at an appropriate distance below the 
5 microprojectile stopping plate. If desired, one or more screens are also positioned between the 
gun and the cells to be bombarded. 

Alternatively, immature embryos or other target cells may be arranged on solid culture 
medium. The cells to be bombarded are positioned at an appropriate distance below the 
microprojectile stopping plate. If desired, one or more screens are also positioned between the 

10 acceleration device and the cells to be bombarded. Through the use of techniques set forth 

-O 

;J3 herein one may obtain up to 1000 or more foci of cells transiently expressing a screenable or 
III selectable marker gene. The number of cells in a focus which express the exogenous gene 
■ff^ product 48 hours post-bombardment often range from one to ten and average one to three. 
J^l In bombardment transformation, one may optimize the pre-bombardment culturing 

%5 conditions and the bombardment parameters to yield the maximum numbers of stable 

transformants. Both the physical and biological parameters for bombardment are important in 
^ this technology. Physical factors are those that involve manipulating the DNA/microprqjectile 

precipitate or those that affect the flight and velocity of either the macro- or microprojectiles. 

Biological factors include all steps involved in manipulation of cells before and immediately 
20 after bombardment, the osmotic adjustment of target cells to help alleviate the trauma associated 

with bombardment and also the nature of the transforming DNA, such as linearized DNA or 

intact supercoiled plasmids. It is believed that pre-bombardment manipulations are especially 

important for successful transformation of inunature embryos. 

In another alternative embodiment, plastids can be stably transformed. Methods 
25 disclosed for plastid transformation in higher plants include the particle gun delivery of DNA 

containing a selectable marker and targeting of the DNA to the plastid genome through 

homologous recombination (Svab etal, Proa Natl Acad Sci (USA,) §7:8526-8530 (1990); 



96 



04983.0216.NPUS01/38-10(15810)B 



Svab andMaliga, Proc. Natl Acad, Sci (USA,) 90:913-917 (1993); Staub and Maliga, EMBO 
J. 72:601-606 (1993); U.S. Patents 5, 451,513 and 5,545,818). 

Accordingly, it is contemplated that one may wish to adjust various aspects of the 
bombardment parameters in small-scale studies to fully optimize the conditions. One may 
5 particularly wish to adjust physical parameters such as gap distance, flight distance, tissue 
distance and hehum pressure. One may also minimize the trauma reduction factors by 
modifying conditions which influence the physiological state of the recipient cells and which 
may therefore influence transformation and integration efficiencies. For example, the osmotic 
state, tissue hydration and the subculture stage or cell cycle of the recipient cells may be adjusted 
10 for optimum transformation. The execution of other routine adjustments will be known to those 

of skill in the art in light of the present disclosure. 
Ifl A^ro&acf^ni/m-mediated transfer is a widely applicable system for introducing genes into 

m plant cells because the DNA can be introduced into whole plant tissues, thereby bypassing the 
Hi need for regeneration of an intact plant from a protoplast. The use of Agrobacterium-mcdiated 
%5 plant integrating vectors to introduce DNA into plant cells is well known in the art. See, for 
^ example the methods described by Fraley et al, Bio/T echnology 5:629-635 (1985) and Rogers et 
^ al. Methods Enzymol 153:253-211 (1987). Further, the integration of the T-DNA is a relatively 
H precise process resulting in few i^arrangements. The region of DNA to be transferred is defined 

by the border sequences and intervening DNA is usually inserted into the plant genome as 
20 described (Spielmann et at, Mol Gem Genet, 205:34 (1986)). 

Modem Agrobacterium transformation vectors are capable of replication in E. coli as 
well as Agrobacterium, allowing for convenient manipulations as described (Klee et al. In: Plant 
DNA Infectious Agents, Hohn and Schell (eds.). Springer- Verlag, New York, pp. 179-203 
(1985)). Moreover, technological advances in vectors for Agrobacterium-mediated gene transfer 
25 have improved the arrangement of genes and restriction sites in the vectors to facihtate 

construction of vectors capable of expressing various polypeptide coding genes. The vectors 
described have convenient multi-linker regions flanked by a promoter and a polyadenylation site 
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for direct expression of inserted polypeptide coding genes and are suitable for present purposes 
(Rogers et aL, Methods Enzymol 153:253-271 (1987)). In addition, Agrobacterium containing 
both armed and disarmed Ti genes can be used for the transformations. In those plant strains 
where Agrobacterium-mtdmtcd transformation is efficient, it is the method of choice because of 
5 the facile and defined nature of the gene transfer. 

A transgenic plant formed u^ing Agrobacterium transformation methods typically 
contains a single gene on one chromosome. Such transgenic plants can be referred to as being 
heterozygous for the added gene. More preferred is a transgenic plant that is homozygous for the 
added structural gene; z.e., a transgenic plant that contains two added genes, one gene at the same 
10 locus on each chromosome of a chromosome pair. A homozygous transgenic plant can be 
^ obtained by sexually mating (selfing) an independent segregant transgenic plant that contains a 
I single added gene, germinating some of the seed produced and analyzing the resulting plants 

produced for the gene of interest. 
JtJ It is also to be understood that two different transgenic plants can also be mated to 

%^ produce offspring that contain two independently segregating, exogenous genes. Selfing of 
^ appropriate progeny can produce plants that are homozygous for both added, exogenous genes 
y] that encode a polypeptide of interest. Backcrossing to a parental plant and out-crossing with a 
H non-transgenic plant are also contemplated, as is vegetative propagation. 

Transformation of plant protoplasts can be achieved using methods based on calcium 
20 phosphate precipitation, polyethylene glycol treatment, electroporation and combinations of 
these treatments (See, for example, Potrykus etal, Mol Gen, Genet 205:193-200 (1986); Lorz 
et aU Mol Gen, Genet. 199:11% (1985); Fromm et aU Nature 319:191 (1986); Uchimiya et al, 
Mol Gen. Genet. 204:204 (1986); Marcotte et al. Nature 335:454-451 (1988)). 

Application of these systems to different plant strains depends upon the ability to 
25 regenerate that particular plant strain from protoplasts. Illustrative methods for the regeneration 
of cereals from protoplasts are described (Fujimura et al, Plant Tissue Culture Letters 2:14 
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(1985) ; Toiiyama et aU TheorAppl Genet 205:34 (1986); Yamada et aU Plant Cell Rep, 4:85 

(1986) ; Abdullah et aU Biotechnology 4:1087 (1986)). 

To transform plant strains that cannot be successfully regenerated from protoplasts, other 
ways to introduce DNA into intact cells or tissues can be utilized. For example, regeneration of 
5 cereals from inunature embryos or explants can be effected as described (Vasil, Biotechnology 
6:391 (1988)). In addition, "particle gun" or high-velocity microprojectile technology can be 
utilized (Vasil et al, Bio/Technology 10:661 (1992)). 

Using the latter technology, DNA is carried through the cell wall and into the cytoplasm 
on the surface of small metal particles as described (Klein et al, Nature 328:10 (1987); Klein et 
10 al, Proa Natl Acad ScL (USA.) 85:8502-8505 (1988); McCabe et aU Bio/Technology 6:923 
B (1988)). The metal particles penetrate through several layers of cells and thus allow the 
yi transformation of cells within tissue explants. 

fo The regeneration, development and cultivation of plants from single plant protoplast 

y I transformants or from various transformed explants are well known in the art (Weissbach and 
%5 Weissbach, In: Methods for Plant Molecular Biology, Academic Press, San Diego, CA, (1988)). 
%^ This regeneration and growth process typically includes the steps of selection of transformed 
zl cells, culturing those individualized cells through the usual stages of embryonic development 
^ through the rooted plantlet stage. Transgenic embryos and seeds are similarly regenerated. The 
resulting transgenic rooted shoots are thereafter planted in an appropriate plant growth medium 
20 such as soil. 

The development or regeneration of plants containing the foreign, exogenous gene that 
encodes a protein of interest is well known in the art. Preferably, the regenerated plants are self- 
pollinated to provide homozygous transgenic plants. Otherwise, pollen obtained from the 
regenerated plants is crossed to seed-grown plants of agronomically important lines. Conversely, 
25 pollen from plants of these important lines is used to pollinate regenerated plants. A transgenic 
plant of the invention containing a desired polypeptide is cultivated using methods well known to 
one skilled in the art. 
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There are a variety of methods for the regeneration of plants from plant tissue. The 
particular method of regeneration will depend on the starting plant tissue and the particular plant 
species to be regenerated. 

Methods for transforming dicots, primarily by use of Agrobacterium tumefaciens and 
5 obtaining transgenic plants have been published for cotton (U.S. Patent No. 5,004,863; U.S. 
Patent No. 5,159,135; U.S. Patent No. 5,518,908); soybean (U.S. Patent No. 5,569,834; U.S. 
Patent No. 5,416,01 1; McCabe et al. Biotechnology 6:923 (1988); Christou et aU Plant Physiol 
57:671-674 (1988)); Brassica (U.S. Patent No. 5,463,174); peanut (Cheng et aU Plant Cell Rep, 
15:653-657 (1996), McKently ^/., Plant Cell Rep. 14:699-103 (1995)); papaya; and pea 
10 (Grant et aU Plant Cell Rep. 75:254-258 (1995)). 

rfi Transformation of monocotyledons using electroporation, particle bombardment and 

yl Agrobacterium have also been reported. Transformation and plant regeneration have been 
fn achieved in asparagus (Bytebier et al, Proc. Natl. Acad. Set (USA) 84:5354 (1987)); barley 
^ (Wan and Lemaux, Plant Physiol 104:31 (1994)); maize (Rhodes et al. Science 240:204 (1988); 
^5 Gordon-Kamm et al, Plant Cell 2:603-618 (1990); Fromm et al, Bio/Technology S:833 (1990); 
^ Koziel et al., Bio/T echnology 11:194 (1993); Armstrong et al, Crop Science 55:550-557 
2 (1995)); oat (Somers et al., Bio/Technology 70:1589 (1992)); orchard grass (Horn et al, Plant 
H Cell Rep. 7:469 (1988)); rice (Toriyama et al, TheorAppl Genet. 205:34 (1986); Part et al. 

Plant Mol Biol 52:1135-1148 (1996); Abedinia et al, Aust. J. Plant Physiol 24:133-141 
20 (1997); Zhang and Wu, Theor. Appl Genet. 76:835 (1988); Zhang et al. Plant Cell Rep. 7:379 

(1988); Battraw and Hall, Plant Scl S6: 191-202 (1992); Christou et al, Bio/Technology P:957 

(1991)); rye (De la Pena et al. Nature 325:214 (1987)); sugarcane (Bower and Birch, Plant J. 

2:409 (1992)); tall fescue (Wang et al, Bio/Technology 10:691 (1992)) and wheat (Vasil et al, 

Bio/Technology 10:661 (1992); U.S. Patent No. 5,631,152). 
25 Assays for gene expression based on the transient expression of cloned nucleic acid 

constructs have been developed by introducing the nucleic acid molecules into plant cells by 

polyethylene glycol treatment, electroporation, or particle bombardment (Marcotte et al, Nature 
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535:454-457 (1988); Marcotte et aU Plant Cell i;523-532 (1989); McCarty et aU Cell 66;895- 
905 (1991); Hattori etal, Genes Dev. 5:609-618 (1992); Gofietal, EMBOJ, 9:2517-2522 
(1990)). Transient expression systems may be used to functionally dissect gene constructs {see 
generally, Mailga et al, Methods in Plant Molecular Biology, Cold Spring Harbor Press (1995)). 
5 Any of the nucleic acid molecules of the invention may be introduced into a plant cell in 

a permanent or transient manner in combination with other genetic elements such as vectors, 
promoters, enhancers, etc. Further, any of the nucleic acid molecules of the invention may be 
introduced into a plant cell in a manner that allows for overexpression of the protein or fragment 
thereof encoded by the nucleic acid molecule. 

10 Cosuppression is the reduction in expression levels, usually at the level of RNA, of a 

B 

4] particular endogenous gene or gene family by the expression of a homologous sense construct 
p that is capable of transcribing mRNA of the same strandedness as the transcript of the 
5 endogenous gene (NapoU et aU Plant Cell 2:279-289 (1990); van der Krol et al. Plant Cell 
H i 2:291-299 (1990)). Cosuppression may result from stable transformation with a single copy 
^5 nucleic acid molecule that is homologous to a nucleic acid sequence found within the cell (ProUs 

and Meyer, Plant /. 2:465-475 (1992)) or with multiple copies of a nucleic acid molecule that is 
^ homologous to a nucleic acid sequence found within the cell (Mittlesten et al, MoL Gem Genet. 
H 24^:325-330 (1994)). Genes, even though different, linked to homologous promoters may result 

in the cosuppression of the linked genes (Vaucheret, CR. Acad. Sol 77/57(5:1471-1483 (1993); 
20 Flavell, Proc. Natl Acad. Sci. (U.S.A.) 97:3490-3496 (1994)); van Blokland et al. Plant J. 

6:861-877 (1994); Jorgensen, Trends Biotechnol 5:340-344 (1990); Meins and Kunz, In: Gene 

Inactivation and Homologous Recombination in Plants, Paszkowski (ed,), pp. 335-348, Kluwer 

Academic, Netherlands (1994)), 

It is understood that one or more of the nucleic acids of the invention may be introduced 
25 into a plant cell and transcribed using an appropriate promoter with such transcription resulting 

in the cosuppression of an endogenous protein. 
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Antisense approaches are a way of preventing or reducing gene function by targeting the 
genetic material (U.S. Patents 4,801,540 and 5,107,065 Mol et aU FEES Lett, 265:427-430 
(1990)). The objective of the antisense approach is to use a sequence complementary to the 
target gene to block its expression and create a mutant cell line or organism in which the level of 
5 a single chosen protein is selectively reduced or abolished. Antisense techniques have several 
advantages over other 'reverse genetic' approaches. The site of inactivation and its 
developmental effect can be manipulated by the choice of promoter for antisense genes or by the 
timing of external appUcation or microinjection, Antisense can manipulate its specificity by 
selecting either unique regions of the target gene or regions where it shares homology to other 
10 related genes (Hiatt et al, In: Genetic Engineering, Setlow (ed.), Vol. 11, New York: Plenum 49- 

1 63 (1989)). 

m The principle of regulation by antisense RNA is that RNA that is complementary to the 

2 target mRNA is introduced into cells, resulting in specific RNA:RNA duplexes being formed by 
7,] base pairing between the antisense substrate and the target mRNA (Green et al, Annu, Rev. 

%5 Biochem, 55:569-597 (1986)). Under one embodiment, the process involves the introduction and 
expression of an antisense gene sequence. Such a sequence is one in which part or all of the 

y normal gene sequences are placed under a promoter in inverted orientation so that the 'wrong' or 

N= complementary strand is transcribed into a noncoding antisense RNA that hybridizes with the 

target mRNA and interferes with its expression (Takayama and Inouye, Crit, Rev, Biochem, Mol 

20 Biol 25\ 155-184 (1990)). An antisense vector is constructed by standard procedures and 

introduced into cells by transformation, transfection, electroporation, microinjection, infection, 
etc. The type of transformation and choice of vector will determine whether expression is 
transient or stable. The promoter used for the antisense gene may influence the level, timing, 
tissue, specificity, or inducibility of the antisense inhibition. 

25 It is understood that the activity of a protein in a plant cell may be reduced or depressed 

by growing a transformed plant cell containing a nucleic acid molecule whose non-transcribed 
strand encodes a protein or fragment thereof. 
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Post transcriptional gene silencing (PTGS) can result in virus immunity or gene silencing 
in plants. PTGS is induced by dsRNA and is mediated by an RNA-dependent RNA polymerase, 
present in the cytoplasm, that requires a dsRNA template. The dsRNA is formed by 
hybridization of complementary transgene mRNAs or complementary regions of the same 
5 transcript. Duplex formation can be accomphshed by using transcripts from one sense gene and 
one antisense gene co-located in the plant genome, a single transcript that has self- 
complementarity, or sense and antisense transcripts from genes brought together by crossing. 
The dsRNA-dependent RNA polymerase makes a complementary strand from the transgene 
mRNA and RNAse molecules attach to this complementary strand (cRNA). These cRNA- 

10 RNAse molecules hybridize to the endogene mRNA and cleave the single-stranded RNA 
adjacent to the hybrid. The cleaved single-stranded RNAs are further degraded by other host 

111 RNAses because one will lack a capped 5' end and the other will lack a poly(A) tail (Waterhouse 

5 et aU PNAS 95: 13959-13964 (1998)). 

7j\ It is understood that one or more of the nucleic acids of the invention may be introduced 

%5 into a plant cell and transcribed using an appropriate promoter with such transcription resulting 
2 in the postranscriptional gene silencing of an endogenous transcript. 

^1 Antibodies have been expressed in plants (Hiatt et a/., Nature 342:16-1% (1989); Conrad 

H and Fielder, Plant Mol Biol 2d.-1023-1030 (1994)). Cytoplasmic expression of a scFv (single- 
chain Fv antibodies) has been reported to delay infection by artichoke mottled crinkle virus. 
20 Transgenic plants that express antibodies directed against endogenous proteins may exhibit a 

physiological effect (Philips et aU EMBO J, 7<5;4489-4496 (1997); Marion-Poll, Trends in Plant 
Science 2.-447-448 (1997)). For example, expressed anti-abscissic antibodies have been reported 
to result in a general perturbation of seed development (Philips et al, EMBO J. 16: 4489-4496 
(1997)). 

25 Antibodies that are catalytic may also be expressed in plants (abzymes). The principle 

behind abzymes is that since antibodies may be raised against many molecules, this recognition 
abiKty can be directed toward generating antibodies that bind transition states to force a chemical 
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reaction forward (Persidas, Nature Biotechnology 75;1313-1315 (1997); Baca et al, Ann. Rev, 
Biophys, Biomol Struct, 2(5/461-493 (1997)). The catalytic abilities of abzymes may be 
enhanced by site directed mutagenesis. Examples of abzymes are, for example, set forth in U.S. 
Patent No: 5,658,753; U.S. Patent No. 5,632,990; U.S. Patent No. 5,631,137; U.S. Patent 
5 5,602,015; U.S. Patent No, 5,559,538; U.S. Patent No. 5,576,174; U.S. Patent No. 5,500,358; 
U.S. Patent 5,318,897; U.S. Patent No. 5,298,409; U.S. Patent No. 5,258,289 and U.S. Patent 
No. 5,194,585. 

It is understood that any of the antibodies of the invention may be expressed in plants and 
that such expression can result in a physiological effect. It is also understood that any of the 
10 expressed antibodies may be catalytic. 
2 (d) Antibodies 

m One aspect of the present invention concerns antibodies, single-chain antigen binding 

]^ molecules, or other proteins that specifically bind to one or more of the protein or peptide 
J: \ molecules of the present invention and their homologues, fusions or fragments. Such antibodies 
1^ may be used to quantitatively or quahtatively detect the protein or peptide molecules of the 
J? present invention. As used herein, an antibody or peptide is said to "specifically bind" to a 

protein or peptide molecule of the present invention if such binding is not competitively inhibited 
by the presence of non-related molecules. 

Nucleic acid molecules that encode all or part of the protein of the present invention can 
20 be expressed, via recombinant means, to yield protein or peptides that can in turn be used to 
elicit antibodies that are capable of binding the expressed protein or peptide. Such antibodies 
may be used in immunoassays for that protein. Such protein-encoding molecules, or their 
fragments may be a "fusion" molecule {Le,, a part of a larger nucleic acid molecule) such that, 
upon expression, a fusion protein is produced. It is understood that any of the nucleic acid 
25 molecules of the present invention may be expressed, via recombinant means, to yield proteins or 
peptides encoded by these nucleic acid molecules. 



104 



04983.0216.NPUS01/38-10(15810)B 



The antibodies that specifically bind proteins and protein fragments of the present 

invention may be polyclonal or monoclonal and may comprise intact immunoglobulins, or 
antigen binding portions of immunoglobulins fragments (such as (F(ab'), F(ab')2), or single-chain 

immunoglobulins producible, for example, via recombinant means. It is understood that 
5 practitioners are familiar with the standard resource materials which describe specific conditions 
and procedures for the construction, manipulation and isolation of antibodies {see, for example, 
Harlow and Lane, In: Antibodies: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring 
Harbor, New York (1988)), 

Murine monoclonal antibodies are particularly preferred. BALB/c mice are preferred for 
10 this purpose, however, equivalent strains may also be used. The animals are preferably 
ifl immunized with approximately 25 jag of purified protein (or fragment thereof) that has been 
111 emulsified in a suitable adjuvant (such as TiterMax adjuvant (Vaxcel, Norcross, GA)). 

Immunization is preferably conducted at two intramuscular sites, one intraperitoneal site and one 
:f: J subcutaneous site at the base of the tail. An additional i.v. injection of approximately 25 |ag of 
%5 antigen is preferably given in normal saline three weeks later. After approximately 1 1 days 

following the second injection, the mice may be bled and the blood screened for the presence of 
^ anti-protein or peptide antibodies. Preferably, a direct binding Enzyme-Linked Inamunoassay 
H (ELIS A) is employed for this purpose. 

More preferably, the mouse having the highest antibody titer is given a third i.v. injection 
20 of approximately 25 [ig of the same protein or fragment. The splenic leukocytes from this 
animal may be recovered 3 days later and then permitted to fuse, most preferably, using 
polyethylene glycol, with cells of a suitable myeloma cell line (such as, for example, the 
P3X63Ag8.653 myeloma cell line), Hybridoma cells are selected by culturing the cells under 
"HAT" (hypoxanthine-aminopterin-thymine) selection for about one week. The resulting clones 
25 may then be screened for their capacity to produce monoclonal antibodies ("mAbs"), preferably 
by direct ELIS A. 
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In one embodiment, anti-protein or peptide monoclonal antibodies are isolated using a 
fusion of a protein or peptide of the present invention, or conjugate of a protein or peptide of the 
present invention, as immunogens. Thus, for example, a group of mice can be immunized using 
a fusion protein emulsified in Freund's complete adjuvant (e.g., approximately 50 \ig of antigen 
5 per immunization). At three week intervals, an identical amount of antigen is emulsified in 
Freund's incomplete adjuvant and used to immunize the animals. Ten days following the third 
inamunization, serum samples are taken and evaluated for the presence of antibody. If antibody 
titers are too low, a fourth booster can be employed. Polysera capable of binding the protein or 
peptide can also be obtained using this method. 
10 In a preferred procedure for obtaining monoclonal antibodies, the spleens of the above- 

described immunized mice are removed, disrupted and immune splenocytes are isolated over a 
m ficoll gradient. The isolated splenocytes are fused, using polyethylene glycol with B ALB^- 
y1 derived HGPRT (hypoxanthine guanine phosphoribosyl transferase) deficient P3x63xAg8.653 

.•li I ■ 

yi plasmacytoma cells. The fused cells are plated into 96 well naicrotiter plates and screened for 

•A. 

Mi hybridoma fusion cells by their capacity to grow in culture medium supplemented with 
hypothanthine, aminopterin and thymidine for approximately 2-3 weeks. 

Hybridoma cells that arise from such incubation are preferably screened for their capacity 

^ to produce an immunoglobulin that binds to a protein of interest. An indirect ELIS A may be 
used for this purpose. In brief, the supematants of hybridomas are incubated in microtiter wells 

20 that contain immobilized protein. After washing, the titer of bound immunoglobulin can be 

determined using, for example, a goat anti-mouse antibody conjugated to horseradish peroxidase. 
After additional washing, the amount of immobilized enzyme is determined (for example 
through the use of a chromogenic substrate). Such screening is performed as quickly as possible 
after the identification of the hybridoma in order to ensure that a desired clone is not overgrown 

25 by non-secreting neighbor cells. Desirably, the fusion plates are screened several times since the 
rates of hybridoma growth vary. In a preferred sub-embodiment, a different antigenic form may 
be used to screen the hybridoma. Thus, for example, the splenocytes may be immunized with 
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one immunogen, but the resulting hybridomas can be screened using a different inimunogen. It 
is understood that any of the protein or peptide molecules of the present invention may be used 
to raise antibodies. 

Such antibody molecules or their fragments may be used for diagnostic purposes. Where 
5 the antibodies are intended for diagnostic purposes, it may be desirable to derivatize them, for 
example with a ligand group (such as biotin) or a detectable marker group (such as a fluorescent 
group, a radioisotope or an enzyme). 

The ability to produce antibodies that bind the protein or peptide molecules of the present 
invention permits the identification of mimetic compounds of those molecules. A "mimetic 

10 compound" is a compound that is not that compound, or a fragment of that compound, but which 

m nonetheless exhibits an ability to specifically bind to antibodies directed against that compound. 

m Having now generally described the invention, the same will be more readily understood 

through reference to the following examples which are provided by way of illustration, and are 

IjI not intended to be limiting of the present invention, unless specified. 

%5 EXAMPLE 1 

In this example, DNA is extracted from soybean plants, amphfied, and mapped. 

y A single trifoliate leaf is collected from the newest growth of four week old soybean 

^ plants. Leaf tissue from the leaf is placed on ice and stored at -80°C. The frozen tissue is 

lyophilized, and approximately 0.01 grams of the tissue is used for DNA extraction. The 0.01 

20 grams of leaf tissue is ground to powder in 1.4 ml tubes. 600 microhters (jil) of DNA extraction 
buffer consisting of 0.5M NaCl, O.IM Tris-(hydroxymethyl) aminomethane pH 8.0, 0.05 M 
ethylenediaminetetra-acetic acid (EDTA), 10.0 g L"^ sodium dodecyl sulfate (SDS), and 2 g L"^ 
phenantroline (dissolved in 0.01 L ethanol) is heated to 65°C (with 0.77 g L"^ dithiothreitol 
added immediately before use) is added to each tube, and each tube is mixed thoroughly. The 

25 samples are placed in a 65°C water bath for 15 minutes and shaken by hand after 10 minutes. 
The samples are taken out of the water bath and cooled to room temperature, and then 200 \xl of 
5 M KOAc is added to each tube. The samples are inverted and placed at 4°C for 20 minutes. 
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Samples are then centrifuged for 12 minutes at 6200 X g and the supernatant (about 600 |ul) is 
transferred to new tubes. DNA is precipitated with 330 fol of cold isopropanol and placed at - 
20°C for 1 hr. The DNA is pelleted by centrifuging at 6200 X g for 10 minutes and washed with 
70% EtOH. The DNA is pelleted by centrifugation at 6200 X g for 10 minutes and dried using a 
5 Speed-Vac, The DNA is dissolved in 100 \il of TEo.i (0.01 M Tris-HCl pH 8.0, 0.0001 M 
EDTA). The extraction will generally yield 500 ng DNA ixY\ 

A polymerase chain reaction (PCR) is conducted with 5 to 10 ng genomic DNA in 10 \xl 
volumes of 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 0.001% gelatin, 1.5 mM MgCh 0.1 mM of 
each dNTP, 150 nM of each primer, 0.01 mM Cresol Red, 2% sucrose and 0.32 units of 
10 AmpliTaq DNA Polymerase (Perkin Elmer Instruments Inc., USA). For thermocycling, the 
Gene Amp PCR System 9700 (Perkin Elmer Instruments Inc., USA) is used with one step of 
yi 94°C for 3 minutes, then 32 cycles of 94^C, 4TC, and 72^C steps of 25 sec each and one final 
ftl step of 72°C for 3 minutes. The PCR products are run on a 6% polyacrylamide gel (30 cm X 8 
y cm X 1 mm) in IX TAE (40 mM Tris-HCl, pH 8.3, 1 mM EDTA) at 180 v for 45 minutes. The 
^5 gels are stained using S YBR Gold (Molecular Probes, Eugene, OR) according to the 
^ manufacturer's instructions, 

SSR primer screening for polymorphism is performed using PIC, HS-1, Will and 
^ PI507354 genotypes. SSRs that are polymorphic and easy to score (ie,, clear banding pattern 

and good separation between alleles) are mapped using the HS-1 x PIC (F2) and/or Will x 
20 PI507354 (RIL) mapping populations. At least one SSR per BAC sequence is mapped. DNA 
markers that exhibited codominant banding pattems are scored as homozygous for one or the 
other parent or as heterozygous, exhibiting both parental alleles. Marker scores are checked for 
segregation distortion using the chi-squared test for goodness of fit to expected ratios. Linkage 
relationships are determined using Mapmaker Version 3.0b with a LOD of 3.0 (Whitehead 
25 Institute, Cambridge, MA). 
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EXAMPLE 2 

DNA fragments containing candidates for genes rhgl and Rhg4 from susceptible and 
resistant soybean lines are subcloned into a TA cloning plasmid (TOPO TA Cloning Kit, Version 
E, Invitrogen Corporation, 1600 Faraday Avenue, Carlsbad, CA). 
5 Genomic DNA from 24 susceptible and 9 resistant lines is isolated using standard 

techniques. Approximately 500 nanograms (ng) of DNA is used for PGR amplification. 
Resistant BAG DNA is isolated by using AUTOGEN (AutoGen Corp., 35 Loring Drive 
Framingham, MA). PCR amplification is then performed using 0.1- 0.2 ng of resistant BAC 
DNA. The primers that are used to amplify candidate rhgl genes PCR are as follows: 
10 Fragment I (2,892 bp) primer (SEQ ID NO: 25), GCA ATA CTT GAA GGA ATA TGT 

1 CCA C; primer (SEQ ID NO: 24), beginning at start codon, ATG GAT GGT AAA AAT TCA 
If! AAA CTA AAC; modified reverse primer 1 (SEQ ID NO: 1 123), beginning 5 bp before start 
m codon; GTT GTA TGG ATG GTA AAA ATT CAA AAC. Fragment II (1 ,746 bp) reverse 
y primer 2 (SEQ ID NO: 27), ending at 13 bp after stop codon, GAC TGG CTG TGA CTG ATC 
J5 TCT CT; primer 2 (SEQ ID NO: 26), CTC ACT TAG ACT GGT GAA TGG AGA. 
E The primers for Rgh4 PCR are as follows: 

W Forward primer (SEQ ID NO: 48), ATG TCT CTC CCG AAA ACC CTA CTT TCT 

CTG; reverse primer (SEQ ID NO: 49), ending at 2 bp after stop codon, GGT TAA GGG GAA 
TGG ATT GAA TGA AAG GAG. 

20 PCR amplification is performed in an MJ Research PTC DNA Engine TM System, Model 

PTC-225 (MJ Research Inc, 590 Lincoln Street Waltham, MA). PGR is performed using the 
following components: 1^1 DNA, 5^1 lOx buffer, l|il primer 1, 1^1 primer 2, lOmM dNTP, 
1.5 111 50mM MgGl2, 0.2|il Taq. (Platinum), 39.3^1 H2O. The PCR program used is as follows: 
95°C for 10 minutes (step 1), 95°C for 30 seconds (step 2), 70°C for 30 secondsZ-rC per 

25 cycle/72°C for 3 minutes (step 3), repeat steps two through three 9 times (step 4), 95°G for 30 
seconds (step 5), 60°C for 30 seconds (step 6), 72°C for 3 minutes (step 7), repeat steps five 
through seven 34 times (step 8), 4°C forever (step 9), end. 
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PCR products are separated on 1% agarose gel by electrophoresis. A single DNA band is 
excised from gel. Gel extraction is done using CLONTECH NucleoSpin Extraction Kit 
(Clonetech Laboratories Inc., 1020 East Meadow Circle, Palo Alto, CA). 2 [xl of purified DNA 
is loaded on 1% agarose gel to check concentration. 40-lOOng of DNA is used for subcloning. 
5 A TOPO cloning reaction is done according to the following: 4|il of fresh PCR product, 

l|j,l Clontech Salt Solution, and 1 \xl TOPO vector. The solution is mixed gently, incubated for 
10 minutes at room temperature, and then placed on ice. 

A one shot chemical transformation is performed as follows. 2|il of the TOPO Cloning 
reaction is added to a vial of TOP 10 One Shot Chemically Competent coli and mixed gently. 

10 The mixture is then Incubated on ice for 30 minutes. The cells are then heat-shocked for 30 

seconds at 42°C, and immediately transferred to ice. 250 \x\ of SOC medium is then added, and 

IJl the mixture is incubated at 37°C for 1 hour. 80 ^1 is then spread onto a selective plate, and 170 

m \\\ is spread onto another plate. The plates are incubated at 37°C for 18-20 hours. The selective 

h| plates are LB agar plates with 100 jag/ml ampicillin, 40 ^ig/ml IPTG, and 40 |ig/ml X-GAL. 

%^ After incubation, 8-10 white or light blue colonies are selected. The positive colonies are 

inoculated into LB medium containing 50 [ig/ml ampicillin and incubated at 3TC overnight. 
Sterilized glycerol is added to make 15% glycerol stock, which can be stored at -SO^'C. 

Sanger sequencing reactions are performed on subclones using BigDye Terminators 
(Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA) and then analyzed on ABI 

20 377/ ABI 3700 automated sequencing machines (Applied Biosystems, 850 Lincoln Centre Drive, 
Foster City, CA), The sequences are evaluated for quality and error probability using the 
program, PHRED (Ewing and Green, Genome Res., 8:186-194 (1998), Ewing et al. Genome 
Res., 8:175-185, (1998)), assembled using the phrap assembler and viewed using caused 
(Gordon et a/., Genome Res., 8:195-202). An rhgl candidate gene is found in BAC 240017, 

25 and is about 4.5 kb in size. An RhgA candidate was found in BAC 318013, and is about 3.5 kb 
in size. 
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EXAMPLES 

The physical mapping of a QTL (quantitative trait locus) is described in this example. 
Mapping is initiated with Hnkage analysis of SSR (simple sequence repeats) markers. Markers 
that are shown to be linked to the QTL of interest are used to PGR screen the soy BAG Ubrary 
5 and identify candidate BAGs. Gonfirmed BACs are subcloned and sequenced, B AG-end 
sequenced, and fingerprinted. New markers are designed from good BAG-end sequences and 
used to screen the library, by either PGR or hybridization to high density grid filters, in order to 
extend the contigs. A BAG-end sequence and fingerprint database of soy BAGs is used in 
conjunction with the above methods to help build and extend contigs. Sequenced BAGs are 

10 aligned, and overlapping BAGs are placed into contigs. These contigs, which contain unique 
sequences, are put into an AGEDB database, and predicted genes are annotated by hand using 

III various programs. Gandidates genes (for the gene of interest) are subcloned from genomic DNA 

m of different lines by PGR using primers from outside the predicted coding regions. These 

Ui subclones are sequenced and screened for SNPs (single nucleotide polymorphisms) and INDELs 
(insertions/deletions), and different haplotypes of the lines with and without the desired 

^ phenotype are examined for correlations between the haplotype and phenotype. 

A single trifoliate leaf is collected from the newest growth of four week old soybean 

^ plants. The leaf tissue is placed on ice and stored at -80°G. The frozen tissue is lyophilized and 
approximately 0.01 grams of tissue is used for DNA extraction. The leaf tissue is ground to 

20 powder in 1 .4 ml tubes and 600 |j,l of DNA extraction buffer [0.5M NaCl, 0. IM Tris- 

(hydroxymethyl) aminomethane pH 8.0, 0.05 M ethylenediaminetetra-acetic acid (EDTA), 10.0 
g L"^ sodium dodecyl sulfate (SDS), 2 g L"^ phenantroline (dissolved in 0.01 L ethanol)] heated 
to 65°G (with 0,77 g L"^ dithiothreitol added inomediately before use) is added to each tube and 
mixed thoroughly. The samples are placed in a 65°G water bath for 15 minutes and shaken by 

25 hand after 10 min. The samples are taken out of the water bath, cooled to room temperature, and 
200 p,l of 5 M KOAc is added to each tube. The samples are inverted and placed at 4''G for 20 
min. Samples are then centrifuged for 12 minutes at 6200 X g and the supernatant (about 600 ixl) 

111 



04983.0216.NPUS01/38-10(15810)B 



is transferred to new tubes. DNA is precipitated with 330 \xl of cold isopropanol and placed at - 
lO^'C for 1 hr. The DNA is pelleted by centrifuging at 6200 X g for 10 minutes and is washed 
with 70% EtOH. The DNA is pelleted by centrifugation at 6200 X g for 10 minutes and dried 
using a Speed- Vac. The DNA is dissolved in 100 \xl of TEo.i (0,01 M Tris-HCl pH 8.0, 0.0001 
5 M EDTA). The extraction yields 500 ng DNA \xY\ 

The polymerase chain reaction (PCR) is conducted with 5 to 10 ng genomic DNA in 10 
|il volumes of 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 0,001% gelatin, 1.5 mM MgCb, 0.1 mM 
of each dNTP, 150 nM of each primer, 0.01 mM Cresol Red, 2% sucrose and 0.32 units of 
AmpliTaq DNA Polymerase (Perkin Elmer Instruments Inc., USA, 761 Main Avenue, Norwalk, 
10 CT). For thermocycling, the Gene Amp PCR System 9700 (Perkin Elmer Instruments Inc., 
m USA, 761 Main Avenue, Norwalk, CT) is used with one step of 94°C for 3 min, then 32 cycles 
Cli of 94°C, 47°C and 72°C steps of 25 sec each and one final step of 72^C for 3 min. The PCR 
^ products are run on a 6% polyacrylamide gel (30 cm X 8 cm X 1 mm) in IX TAE (40 mM Tris- 
y HCl, pH 8,3, 1 mM EDTA) at 180v for 45 min. The gels are stained using SYBR Gold 
i5 (Molecular Probes, Eugene, OR) per manufacturers instructions. 

SSR primer screening for polymorphisms is performed using PIC, HS-1, Will and 
%: PI507354 genotypes. SSRs that are polymorphic and easy to score (i.e., Clear banding pattern 
^ and good separation between alleles) are mapped using the HS-1 x PIC (F2) and/or Will x 

PI507354 (RIL) mapping populations. At least one SSR per BAC sequence is mapped. DNA 
20 markers that exhibited codominant banding patterns are scored as homozygous for one or the 
other parent or as heterozygous, exhibiting both parental alleles. Marker scores are checked for 
segregation distortion using the chi-squared test for goodness of fit to expected ratios. Linkage 
relationships were determined using Mapmaker Version 3.0b with a LOD of 3.0 (Whitehead 
Institute for Biomedical Research, Cambridge MA). 
25 Thirty-two BAC DNA superpools (10 genomic equivalents) extracted from either 4608 

clones (48 96-well microtiter plates) are used as templates for the first round of PCR screening. 
Following identification of the positive superpools, the second screening is performed against 4- 
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D BAG DNA pools. Each clone of the superpool is addressed 4-dimentionalIy (7 X 7 X 12 X 8) 
and pooled in each dimension. Each set of 48 plates is divided into 6 sets of 7 plates and one set 
of 6 plates, and partitioned in two ways. The first partition is in numerical order, plates 1-7, 8- 
14, . . . 43-48 representing 7 group or stack pools. The second partition is according to plate 
5 position within each of the respective stacks, plates [1, 8, 15, 22, 29, 36], [2,9,16, 23, 30, 37, 43] 
etc., representing 7 plate pools. Each well of the 96-well plates contains 12 columns and 8 rows. 
Clones from row 1 are pooled from all 48 plates to generate the row 1 pool. Clones of rows 2, 3, 
4. .,.8, and columns 1, 2, 3. ...12 are pooled to generate 8 row pools and 12 column pools 
respectively. 

10 For each superpool, BAC DNA is extracted from a total of 34 subpools (7 4- 7 + 8 + 12). 

J| Positive clones are identified by TaqMan/PCR screening of the 34 subpools if one positive clone 
Ul is present. If more than one positive clone is present in a superpool, a third round of screening 
m with N4 PGR reactions is performed. 

r= \ Addresses of candidate B ACs are identified, and the candidates are streaked out for 

1^5 single colony isolation and grown overnight at 37°C. A single, isolated colony is picked and 
f-. streaked out and grown overnight at 37°C. PCR is repeated for the marker of interest (using the 
program designed for the relevant marker) using a smear of cells from the plate streaked from a 
^ single colony. The PCR product is run on a 2% agarose gel and purified using the Clonetech 

NucleoSpin Gel Extraction Eat (according to the manufacturer's instructions, Clonetech 
20 Laboratories Inc., 1020 East Meadow Circle, Palo Alto, CA) and 10-50 ng of the purified DNA 
are added to 10 pmol of each primer (forward and reverse), in a total volume of 6 \i\ of ddH20 
and 2 \x\ of BigDye Terminators (AppUed Biosystems, 850 Lincoln Centre Drive, Foster City, 
CA). The cycling conditions are: 96°C for 1 minute (step 1), 96°C for 10 seconds (step 2), 50°C 
for 5 seconds (step 3), 60°C for 4 minutes (step 4), steps 2-4 are repeated for 24 cycles (step 5), 
25 and hold at 4^C. 

The generated sequence is compared to the consensus sequence using DNA comparison 
software. Confirmed clones are subcloned, sequenced, B AG-end sequenced, and Fingerprinted. 
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B AC-end sequencing is done using 32 pmol of SP6 and T7 primers (separately), 
approximately 600 ng-1 ug of BAC DNA (Autogen prepped, AutoGen Corp., 35 Loring Drive 
Framingham, MA) reaction, resuspended in 6|il of ddH20, and 4|il of BigDye Terminators 
(Applied Biosystems 850 Lincoln Centre Drive, Foster City, CA) to give a total reaction volume 
5 of lOul. The cycling conditions are: 96^C for 2 minutes (step 1), 96°C for 15 seconds (step 2), 
SO^'C for 15 seconds (step 3), 60°C for 4 minutes (step 4), steps 2-4 are repeated for 50-60 cycles 
(step 5), 72°C for 2 minutes (step 6), hold at 4°C or 10°C (step 7). 

The reactions are ethanol precipitated and loaded on capillary sequencers. The newly 
generated B AC-end sequence is trimmed from the vector sequence, and entered into a database 
10 containing approximately 400,000 BAC-end sequences. Each BAC is blasted against the 
%D database to search for BAC-end matches extension of the contigs. New markers are designed 
111 from good BAC-end sequences, and these are then used to rescreen the library in order to build 
OJ np contigs across the region of interest. Screening can be done in either of two ways: as above 
hj (PCR strategy), or by hybridization of high-density grid filters from Research Genetics 

(Research Genetics, 2130 Memorial Parkway, Huntsville, AL). 
k The probes used for hybridization are derived from clones or genomic DNA by PCR 

amplification using the vector or gene-specific primers, with the appropriate cycling conditions. 
P« PCR products are run on a 1% agarose gel containing ethidium bromide (0.2 ug/ml) in IX TAE 

buffer at 100 volt for 1-2 hrs. Isolated DNA fragments are excised and gel-purified using the 
20 Clonetech NucleoSpin gel extraction kit (Clonetech Laboratories Inc., 1020 East Meadow Circle, 
Palo Alto, CA), before labeling. In order to check the size of the fragments and concentration, 2 
|il of eluted DNA plus loading buffer are loaded on a 1% agarose gel along with DNA markers 
of known concentration and size. All the probes used to screen the Kbrary are tested individually 
for repetitiveness, with a smaller filter spotted with random clones from the library along with 
25 some positive control clones according to the protocol described below. 

The A3244 soy library generated by a an £'coRI digest is spotted on 3 high density grid 
filters from Research Genetics (Research Genetics, 2130 Memorial Parkway, Huntsville, AL). 
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Each filter has six fields, twelve 384 well plates are spotted in each field in duplicate, with a total 
of 27,648 clones spotted on each filter. The plates are spotted in a 5X5 grid (12 clones per 5X5 
grid) pattern within each field. Each clone is spotted in duplicate with a specific orientation 
within the 5X5 grid, which, together with the field position, gives information about its address. 
5 In a first round hybridization procedure, multiple probes are labeled separately and then pooled 
together to hybridize to BAG filters. Positive BACs identified in this procedure are 
deconvoluted by rehybridization with the individual probes. 

A hybridization oven is set at 65°C, and Church Buffer (0.5 M Sodium Phosphate, pH 
7.0, 7% SDS, 1% bovine serum albumin, 1 mM EDTA, 100 [ig/ml salmon sperm DNA) is 
10 prewarmed to 65°C. Membranes are washed in 500 ml of O.IX SSC, 0.1% SDS in a large 
fl container at room temperature for 5 minutes with gentle shaking (50 rpm) on a rotary shaker. 
Hi The membranes are rinsed with 500 ml of O.IX SSC (no SDS) for 1 minute. The wash solution 
m is poured off, and 500 ml of 6XSSC (no SDS) is added to equilibrate the membranes. Three 
Ij filters are placed in a tube. The filters are separated from each other and the sides of the tube by 
^5 a layer of mesh. Each tube is filled with 6XSSC and shaken gently with the tube vertical to help 
eliminate bubbles between the filters and tube wall The 6X SSC solution is poured off, and 25 
zl ml of pre-warmed Church buffer is added. The bottles are rotated in a hybridization oven at 60 
^ rpm and 65°C for 30 minutes or longer. 

Probes are labeled using 1 |il of 40-50 uCi/iil [a^¥ dCTP], 50 ng of purified DNA in 49 
20 (il of ddH20, and Read-To-Go Labeling Beads from Amersham Pharmacia according to the 
manufacturers instructions (Amersham Pharmacia, Uppsala, Sweden). The probes are purified 
using the Bio-Spin Column P30 from BioRad according to manufacturers instructions (Bio-Rad 
Laboratories, 3316 Spring Garden Street, Philadelphia, PA). To 1 |al of the column-purified 
probe is added to a minipoly-Q vial (hquid scintillation vial) for each probe. 5 nal of scintillation 
25 liquid is added to each vial, and radiation activity for each vial is measured using a hquid 
scintillation counter. 
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After the probes are purified and counted for radioactivity, 10-20 probes and one control 
probe (from 50 |il reaction) are pooled with 10 cpm/probe each, into one 1.5 ml eppendorf tube. 
The pooled probes are denatured at 99°C in a sand heating block for 10 minutes. The tubes are 
cooled on ice or ice water about 2 minutes, and then spun down at 14,000 rpm for 30 seconds in 
5 microcentrifuge. The tubes are pre-hybridized in 25ml of Church buffer for at least 30 minutes, 
which is then poured off. 40 ml of fresh hybridization solution (pre-warmed Church buffer) is 
added. The pooled-probe solution is added to the hybridization tube. The tube is rotated in the 
hybridization oven at 60 rpm, 65°C overnight. 

The probe solution is poured off, 30 ml of pre-warmed {65^C) IX SSC, 0.1% SDS 
10 washing solution is added to the hybridization tube, the hybridization tube is rotated in the 
# hybridization oven (at 65° C) for 15 minutes, and the process is repeat two times. At the last 
III wash, the tube is rotated 180° and at the same speed for 15 minutes at 65 °C. The washing 
m solution is poured off, and 2X SSC (no SDS) is added. 

yj Excess liquid is removed from each filter by placing the filter on a piece of 3MM paper. 

%S The washed filter is placed on developed film with the DNA-side up (the side B ACs were 
il spotted on), covered with Saran wrap, and squeezed to force out Hquid and bubbles. The Saran 
% wrap is folded to the other side of the film, fixed it with tape, and then dried Kimwipes. The 

wrapped filters are placed into a film cassette with the DNA-side up (the side B ACs were spotted 

on), which is placed on BioMax MS film (Biomax Technologies Inc., Vancouver, BC, Canada) 
20 in a darkroom, and exposed overnight at room temperature without an intensifying screen. Film 

is developed with a film developer in the dark room the next day, and each film is labeled with 

filter number, probe used for hybridization, exposure time, and date. 

Starting from Field 3, a 384-well grid is put on the field with the Al position of the grid 

on the upper right, and the grid is aligned to the image. The row and column position for each 
25 positive clone on the B AC recording spreadsheet is determined and recorded. The pattern of the 

hybridization signal is matched to known patterns. There are 6 plate reference numbers for each 

of twelve patterns, which are arranged in the same manner as the 6 fields. Based on the signal 
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pattern and field number, a plate reference number is determined for each positive clone. The 
grid is moved to the next field and the process is repeated. The original plate number (P) is 
determined using the following formula: P = (N-1) x 72 + R, where N is the filter number on 
which the identified clone is present and R is the plate reference number previously determined. 
5 The complete address of the identified clone is given by the original plate number plus its 
position on the plate determined previously, BACs' addresses are identified and converted to 
"imp" files according to a Q-bot file format. 

24 working plates are loaded into a Q-bot (Genetix, Queensway, New Milton, 
Hampshire, United Kingdom) 6-high hotel and media-filled 96-well plates are placed on the 
10 deck. The Q-bot is run following the standard manual using the program called "Rearraying98" 
^Jl with the settings given in Appendix III of the accompanying manual: B AC-Picking. Plates 
m containing picked clones are placed in a shaker incubator and grown overnight at 37° C at 200 
SI rpin- 

yj 35 |il DNA solution are transferred from 96-well plates into a 3 84- well plate using a 

S?5 Platemate such that 4 96-well plates of DNA are combined into one 3 84- well plate. The 384-pin 
^ head (puck) is washed in 10% SDS solution for 5 minutes, ultrasonicated in a water bath for 3 
z! minutes, washed with 70% ethanol for 1 min., and air dried for 3 minutes. The 384-well DNA 
^ source plates and membranes are arranged on the deck according to the instruction from the 

manual and the spotted grid design chosen for the membrane. Spotting pattern are designed so 
20 that there is one control probe at each of the 4 comers of the membrane. An asynametric pattern 
is used to orient filters. The control probe concentration is about 5 ng/uL Zeus is run according 
to instructions. If the DNA concentration is lower than 5 ng/ul, the Zeus is run a second time to 
double the amount of spotted DNA on the membrane. One of the empty spots is spot dyed, if 
available, using one 384-well dye plate. If an empty spot is not available, it is printed on one of 
25 the DNA spots. This spot marks the position for cutting filters into small membranes (9X12 cm). 
Membranes are interleaved between 3M papers and left to air-dry. Each comer of each 
membrane is marked with a permanent marker and numbered. Filters are denatured on the 
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surface of 3M paper soaked with denaturalization solution for 4 minutes, and then neutralized on 

the surface of 3 M paper soaked with neutralization solution for 5 minutes. The filters are 

washed with 2XSSC for 5 minutes and then air dried. The filters are then baked at 80°C for 1 hr. 

and cut into individual small membranes (9X12 cm) according to the marked comer. 
5 To confirm and deconvolute, hybridizations are done as before, but with newly generated 

filters, and each probe is done separately with a single filter using the smaller tube. 15 ml of 

Church buffer is used for the hybridization. 

Fingerprints are generated by digesting the BAG DNA with Hind III for 3 hours at 3TC 

and running the reaction on a 0.8% gel at 200V for 19 hours. The gels are stained with 
10 SybrGreen, while shaking at room temperature for 45 minutes, and scanned with a Flourlmager. 
# The bands are sized using Frag software and the fingerprints are assembled into contigs within 
itff FPC. Every time new clones are added the contigs are rebuilt using a tolerance of 10 and a 
5 cutoff of 10'^ 

hj Subclones are generated and Sanger sequencing reactions were performed on randomly 

J5 chosen subclones using BigDye Terminators (Applied Biosystems, 850 Lincoln Centre Drive, 
Foster City, CA) then analyzed on ABI 377/ABI 3700 automated sequencing machines (Applied 
Biosystems, 850 Lincoln Centre Drive, Foster City, CA). 7-8 fold sequence coverage is thereby 

^ generated across the B AC. The sequences are evaluated for quality and error probabihty using 
the program, phred, assembled using the phrap assembler, and viewed using consed, as in 

20 example 2. For Bermuda standard BACs, all contigs are ordered and oriented and all gaps are 
closed using a directed primer walking strategy. A final quality value of phred40 (1 base error in 
10,000 bases) with no gap regions, double coverage or two chemistries across single stranded 
areas is achieved. 

The sequence contigs are put into an ACEDB database along with soy EST and plant 
25 EST matches, along with Blastx, Tblastx, and Plant Blastn hits. Genemark.hmm is used to 

predict possible genes, and GeneFinder is used to predict splicing sites, ORFs, potential coding 
regions, as well as start and stop codons. The contigs are then annotated by hand and predicted 



118 



04983.0216.NPUS01/38-10(15810)B 



genes accepted, edited, and modified based on the characteristics present in the sequence and 
matches to protein, nucleotide, and EST databases. 

The high-density B AC library membranes used for hybridization are made by Research 
Genetics (Research Genetics, 2130 Memorial Parkway, Huntsville, AL), using a modified Q-bot 
(Genetix, Queensway, New Milton, Hampshire, United Kingdom), 384-well plates containing 
B AGs are spotted onto 22 cm X 22 cm Hybond N+ membranes (Amersham Pharmacia, Uppsala, 
Sweden), Bacteria from 72 plates are spotted twice onto one membrane, giving 55,296 colonies 
in total, or 27,648 unique clones per membrane. The plates are spotted into six "fields" per 
membrane, with each field having 12 plates spotted in duplicate. This spotting format results in 
six fields with 384 grids in each field. Each grid is a 5X5 matrix containing 12 unique clones in 
duplicate, with the center position left empty. The two positions occupied by each clone in 
duplicate are designed to give a unique pattern that indicates the plate location of each clone. 
After spotting, the bacteria on the membrane are incubated for 8 hours on LB-agar plates 
containing 12.5 ug/ml chloramphenicol. The membranes are then denatured, neutralized, 
washed in a standard procedure, UV-light crosslinked, and air-dried. The membranes can be 
stored and shipped at room temperature. 

Every reference, patent, or other published work cited above is herein incorporated by 
reference in its entirety. 
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